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1  Motivation 


“I  would  like  to  describe  afield,  in  which  little  has  been  done,  but  in  which  an  enormous 
amount  can  be  done  in  principle.  . . .  Furthermore,  a  point  that  is  most  important  is  that  it 
would  have  an  enormous  number  of  technical  applications.  What  1  want  to  talk  about  is 
the  problem  of  manipulating  and  controlling  things  on  a  small  scale.”  Richard  R  Feynman, 
There’s  Plenty  of  Room  at  the  Bottom,  American  Physical  Society  Caltech,  Dec.  29,  1959 

In  our  quest  for  a  deeper  understanding  of  physical  and  biological  phenomena,  we  move 
into  the  “small  scale”  world  of  quantum  mechanics.  The  rules  of  this  world  herald  new  types 
of  materials  and  devices  [1,  2,  3].  Quantum  information  systems  and  instruments  of  mea¬ 
surement  promise  an  exponential  improvement  in  speed  and/or  resolution  compared  to  their 
classical  counterparts.  Many  of  these  systems  inherently  rely  on  estimation  for  their  normal 
operation,  e.g.,  atomic  clocks,  measuring  electrical,  thermal,  and  photonic  characteristics,  bio¬ 
metrics,  magnetometry,  and  gravimetry.  Some  will  require  estimation  to  determine  if  the  system 
is  meeting  performance  demands  and  then  apply  a  control  adapted  to  the  specific  estimated  er¬ 
ror,  e.g.,  [4,  5,  6,  7,  8].  Estimation  will  also  be  used  for  simply  gaining  an  understanding  of 
observed  phenomena. 

Despite  the  promise  and  various  laboratory  successes,  for  many  of  these  quantum  systems 
ab  initio  models  do  not  yet  exist  which  can  be  used  to  optimize  the  design  or  determine  a  robust 
control  for  actual  application.  The  only  practical  approach  is  quantum  system  identification 
-  that  is  -  identifying  a  model  from  measurements,  either  as  an  intrinsic  part  of  their  oper¬ 
ation  or  in  a  calibration/tuning  stage  prior  to  operation.  In  particular,  instrumentation  noise, 
decoherence,  and  modeling  errors  are  all  sources  of  uncertainty  which  either  separately  or  in 
combination  hinder  the  ability  of  the  device  to  meet  performance  demands.  Finally,  common  to 
all  methods  of  quantum  system  identification,  as  well  as  quantum  control  design  [9,  10],  is  the 
computational  burden  imposed  by  the  dimension  of  the  parameter  space. 


2  What  was  proposed 

We  proposed  to  investigate  a  method  of  identification  which  has  the  potential  to  alleviate  all  the 
aforementioned  problems.  The  question  we  posed  was: 

Can  li-norm  minimization,  which  has  had  enormous  success  in  signal  processing 
for  estimating  a  sparse  variable  from  highly  incomplete  and  noisy  measurements, 
be  applied  to  significantly  improve  the  accuracy  and  efficiency  of  quantum  system 
identification  ? 

The  basic  mathematical  foundations  for  /  rnorm  minimization,  often  generally  referred  to  as 
Compressive  Sensing,  can  be  found  in  [11,  12].  (A  web  search  on  Compressive  Sensing  will 
bring  many  tutorials  and  testimonials).  In  general,  for  £  i  -norm  minimization  to  be  effective, 
the  underlying  signal  (or  parameter  space)  must  be  sparse.  This  in  turn  allows  for  a  significant 
reduction  in  the  number  of  measurements  (resources)  needed  for  reconstructing  the  signal.  Of 
course  if  the  sparsity  pattern  is  known  then  standard  methods  can  be  applied.  Why  it  works  so 
well  is  because  the  (?i-norm  is  a  convex  heuristic  for  sparsity,  which  is  not  a  convex  function. 
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The  analogy  has  been  made  that  it  is  like  solving  a  Sudoku  puzzle:  only  a  few  given  numbers 
in  the  grid  will  force  a  unique  solution  even  if  the  grid  is  large. 

Since  compressive  sensing  methods  can  reduce  resources  by  orders  of  magnitude,  the  ben¬ 
efits  from  a  positive  answer  to  the  above  question  for  quantum  estimation  would  alleviate  (or 
remove)  the  computational  burden.  Beyond  this,  as  miraculous  as  it  may  sound,  this  estimation 
method  would  impact  the  device  performance  directly  much  as  it  has  for  digital  and  medical 
imaging,  e.g.,  [13,  14]. 


3  Hoped  for  benefits 

If  successful,  the  potential  benefits  include  the  following. 

•  Ancilla  assisted  quantum  process  tomography  would  achieve  the  same  accuracy  with  a 
significantly  smaller  number  of  ancilla.  (A  quantum  process  tomography  example  in 
[15],  repeated  here  in  a  later  section,  using  ()  -norm  minimization  required  only  36  mea¬ 
surements  to  estimate  256  parameters  compared  to  standard  methods  which  require  256 
measurements.) 

•  Quantum  metrology  devices  which  rely  on  entangled  states  to  enhance  accuracy  would 
find  relief  in  the  number  of  entangled  particles  required. 

•  Phase  estimation,  which  is  the  example  posed  for  Phase  I,  is  at  the  heart  of  Shor’s  algo¬ 
rithm  (the  quantum  Fourier  transform).  Compressive  sensing  methods  could  significantly 
impact  the  ancilla  real-estate  required  for  the  associated  error-correction. 

•  Instrumentation  limitations  in  both  state  preparation  and  measurement  protocols  would 
not  hinder  estimation  efficiency. 

•  If  Hamiltonian  identification  really  is  fast  and  easy,  then  this  suggests  the  very  important 
possibility  of  a  non-qubit  quantum  analog  computer. 


4  Proposed  tasks 

To  achieve  the  hoped  for  benefits  we  proposed  a  two-phase  program.  Phase  was  to  be  a  the¬ 
oretical  study  to  develop  the  mathematical  and  computational  tools  for  f'x-norm  minimization 
applied  to  quantum  process  tomography  and  quantum  parameter  estimation  metrology.  If  Phase 
I  was  successful,  then  Phase  II  would  bring  in  experimental  components  based  on  the  mathe¬ 
matical  and  computational  tools  developed  in  Phase  I.  The  actual  scope  and  level  of  effort  for 
Phase  II  will  be  determined  in  collaboration  with  DARPA  prior  to  the  end  of  the  Phase  I  effort. 
To  commence  we  posed  the  following  Phase  I  tasks: 

•  Task  1.1  Extend  the  ('i-norm  minimization  theory  to  QPT.  Specifically,  answer  the  ques¬ 
tion:  Is  the  scaling  of  resources  linear  in  the  number  of  qubits? 

•  Task  1.2  Develop  computationally  efficient  f^-norm  minimization  algorithms  which  are 
specific  for  QPT. 
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•  Task  1.3  Apply  the  results  of  Tasks  1.1  and  1.2  to  quantum  metrology.  Specifically,  for 
phase  estimation  in  a  noisy  environment,  answer  the  questions:  Does  the  algorithm  de¬ 
scribed  previously  converge  to  the  correct  phase  within  a  prescribed  tolerance?  Does 
it  use  less  resources  than  standard  approaches?  Can  entangled  inputs  be  eliminated  or 
reduced  in  dimension? 

As  states  in  our  proposal,  it  was  assumed  that  if  Phase  I  is  successful  and  deemed  a  “GO” 
by  DARPA,  then  the  theory  and  tools  developed  up  to  that  point  will  provide  for  experiments 
to  help  further  develop  the  tools  and  theory.  Given  the  emerging  new  concepts  and  software  for 
performing  quantum  system  identification,  it  was  envisioned  that  it  would  be  important  to  have 
a  flexible  working  laboratory  system  to  test  the  tools  and  refine  them.  Or  more  poetically,  as 
Feynman  put  it  [16]: 

“ The  test  of  all  knowledge  is  experiment.  Experiment  is  the  sole  judge  of  scientific  ‘truth.’  ” 

We  proposed  to  test  the  capabilities  of  quantum  system  identification  via  t  i-norm  minimiza¬ 
tion  with  two  types  of  experimental  systems:  an  optical  interferometer  and  atomic  Rb,  each  of 
which  provides  a  flexible  system  with  well  understood  characteristics.  Both  of  these  systems 
are  available  at  Princeton.  Although  we  could  not  specify  exactly  the  tasks  for  Phase  II,  we  did 
propose  the  following  task  framework  to  be  filled  in  after  a  “GO”  decision  has  been  reached. 

•  Task  II.l  Over  the  Phase  II  period  a  full  battery  of  quantum  system  identification  tests 
could  be  performed  to  benchmark  the  new  algorithmic  capabilities  and  provide  feedback 
for  computational  improvements  as  well  as  further  theoretical  developments. 


5  What  was  achieved 

Early  in  the  program,  and  very  much  earlier  than  anticipated,  we  demonstrated  the  effectiveness 
of  using  Compressive  Sensing  (CS)  algorithms  for  Quantum  process  Tomography  (QPT)  on 
simulated  data.  (§A  contains  a  copy  of  the  paper.).  Soon  thereafter  we  extended  CS  theory 
for  QPT  to  account  for  the  restrictions  imposed  by  quantum  mechanics.  We  showed  that  for  a 
('/-dimensional  system,  where  standard  QPT  requires  0(d4)  configurations,  CS  heralds  O(sd) 
configurations,  where  s  is  the  sparsity  level  associated  with  the  best  s-sparse  approximation 
(the  actual  system  need  not  be  sparse).  Over  the  next  several  months  -  in  fact  almost  up  to  the 
end  of  the  originally  proposed  period  of  performance  -  using  data  obtained  from  a  two-qubit 
photonic  experiment  at  the  Center  for  Quantum  Computer  Technology,  Department  of  Physics, 
The  University  of  Queensland,  Australia,  we  demonstrated,  for  the  first  time,  the  use  of  CS  for 
QPT,  which  we  called  CQPT  for  Compressed  Quantum  Process  Tomography.  The  theoretical 
and  experimental  work  was  published  eventually  in  PRL,  a  copy  of  which  is  contained  in  §B. 

To  summarize  this:  the  process  matrix  for  this  2-qubit  experiment  is  16x16.  Taking  into 
account  the  trace  preserving  condition,  QPT  requires  estimating  240  real  parameters.  Standard 
methods  of  QPT  would  require  at  least  that  number  of  experimental  configurations.  Using  CS 
methods,  we  obtain  a  97%  fidelity  with  32  selected  configurations  and  a  94%  fidelity  with  18 
selected  configurations. 
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These  are  just  a  few  of  the  typical  experimental  results.  All  these  conform  extremely  well 
with  our  early  simulations,  as  well  as  being  similar  in  character  to  what  has  been  seen  in  audio 
and  video  processing.  In  these  latter  applications  the  signal  sizes  are  significantly  larger,  e.g., 
O(106),  hence,  specialized  algorithms  have  been  developed  to  account  for  the  signal  structure. 
A  future  effort  is  to  develop  special  algorithmic  structures  for  larger  QPT. 

Due  to  the  early  theory  development,  and  especially  the  unanticipated  and  exciting  early 
experimental  success,  and  the  time  required  to  gather  the  data,  some  of  the  Phase  I  goals  were 
refined,  some  re-defined,  and  some  have  been  out  of  reach  in  the  time  remaining.  We  are  very 
pleased  about  the  impact  of  the  success  with  “real”  data  which  now  compels  some  advanced 
and  new  broad  and  promising  research  directions: 

•  Introduction  of  a  tailored  theory  and  associated  experiment  design  method  for  effective 
scaling  on  multi-qubit  systems. 

•  Development  of  CS  for  Hamiltonian  identification. 

•  Demonstrate  that  these  ID  tools  can  be  used  for  control  and/or  device  design,  where  in 
the  latter  case,  to  correct  for  manufacturing  exigencies. 

Applications  for  CS  applied  to  quantum  systems  are  just  emerging  in  many  areas.  One  can 
envision,  as  we  have  in  the  Phase  I  proposal,  applications  to  interferometry,  quantum  metrology, 
magnetrometry,  spectroscopy,  and  so  on. 

On  a  personal  note,  in  my  initial  discussions  with  Dennis  Healy  we  mused  about  what  the 
potential  could  be  for  this  program.  He  was  very  optimistic,  but  at  the  time,  I  was  not  ready  to 
stick  my  neck  out  that  far.  Considering  our  success  at  this  time,  Dennis  was  right! 


6  What  remains  to  be  done 

Despite  two  no  cost  extensions,  we  ran  out  of  time  to  thoroughly  develop  and  test  our  ideas  for 
applying  compressed  sensing  methods  to  problems  in  Hamiltonian  identification.  Nonetheless 
we  did  develop  a  CS  theory  of  Hamiltonian  identification  valid  for  short  time  scales.  (This 
complements  our  previous  work  in  Hamiltonian  parameter  estimation  [17].)  The  paper  on  this 
subject  will  appear  soon  in  PRA.  A  copy  is  contained  in  §C. 

What  we  were  ultimately  after  was  a  theory  and  associated  computational  method  appli¬ 
cable  to  problems  in  quantum  metrology  or  more  general  interferometry  problems.  These  are 
essentially  single  parameter  estimation  problem.  A  brief  summary  of  what  we  were  (and  are 
still)  thinking  now  follows. 

Figure  1  is  a  block-diagram  operational  representation  of  a  general  interferometer.  Here 
the  unknown  system  <S(0O)  consists  of  a  unitary  (7(0 o)  in  channel  a  dependent  on  an  unknown 
phase  0O  followed  by  an  unknown  noise  operation  S  acting  on  both  channels  a  and  b.  The  usual 
assumption  is  that  the  unitary  is  of  the  form  (7(0 0)  =  exp(/0o77)  with  unknown  phase  0O  and 
known  Hamiltonian  H  [18].  Typically  the  range  of  the  phase  parameter  is  known. 

The  interferometric  set-up  is  envisioned  initially  as  a  Mach-Zehnder  interferometer  (Fig.  2) 
with  the  addition  of  extra  beam  splitters  in  both  arms  to  create  photon  loss  as  expressed  schemat¬ 
ically  in  Fig.  1.  Figure  2  shows  a  schematic  of  the  classical  Mach-Zehnder  interferometer  for 
phase  estimation. 
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Figure  1 :  General  configuration  for  interferometric  phase  estimation 


A  coherent  light  beam  is  split  into  two  parts. 
The  phase  difference  4>  between  the  two  op¬ 
tical  arms  is  estimated  by  analyzing  photon 
statistics  of  the  two  output  beams. 

BS:  beam-splitter 
PD:  photodetector 


Figure  2:  Classical  Mach-Zehnder  interferometer. 

For  single  parameter  (phase)  estimation  the  limit  of  theoretical  accuracy  in  the  ideal  noise- 
free  case  has  been  examined  in  depth,  e.g.,  [19],  [20],  [21],  [18],  [22],  [23].  These  studies  reveal 
that  special  preparation  of  the  instrumentation  -  the  probe  -  can  achieve  an  asymptotic  variance 
smaller  than  the  Cramer-Rao  lower  bound,  the  so-called  Quantum  Cramer-Rao  bound,  or  the 
Quantum  Fisher  Information  (QFI).  Specifically,  the  unique  quantum  property  of  entanglement 
can  increase  the  parameter  estimation  convergence  from  the  shot-noise  limit  of  1/ x^N  to  the 
Heisenberg  limit  1/N,  which  arises  from  the  uncertainty  principle  [24].  In  the  latter  case  N 
refers  to  the  dimension  of  an  entangled  state.  The  theoretical  QFI,  however,  will  not  be  obtained 
in  the  presence  of  noise,  i. e. ,  decoherence.  As  stated  in  [25]: 

“Existing  treatments  come  to  the  conclusion  that  the  benefit  from  highly  entangled  states 
deteriorates  quickly  even  if  only  a  small  amount  of  noise  is  present  in  the  system  ...  states 
of  this  type  are  typically  very  fragile:  In  optical  interferometry,  the  well-studied  NOON  state 
promises  to  provide  Heisenberg  limited  sensitivity  in  phase  estimation  ...  the  loss  of  merely 
a  single  photon  renders  this  state  useless  since  it  collapses  into  a  product  of  two  Fock  states 
which  can  not  acquire  any  phase  information.” 

Fig.  6  shows  a  2-qubit  system  where  the  ideal  single-parameter  unitary  is  corrupted  by  ampli- 
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Figure  3:  Fisher  Information  vs.  number  of  entangled  states  for  varying  levels  of  amplitude 
damping  7. 


tude  damping  7.  In  photonic  systems  7  is  the  probability  of  a  photon  loss.  In  the  ideal  case 
(7  =  0)  Fisher  information  rises  linearly  with  the  number  of  entangled  states.  However,  even 
for  a  small  amount  of  noise  at  the  5%  level  we  start  to  see  a  significant  loss  of  information. 
In  addition  to  this  sensitivity  to  noise,  the  QFI  may  also  be  unreachable  simply  because  the 
instruments  are  limited,  i.e.,  not  all  states  can  be  prepared  and  not  all  measurement  schemes  are 
possible,  e.g.,  [26,  17]. 

To  alleviate  these  problems  we  proposed  using  a  bank  of  estimators  applied  to  the  data, 
where  each  estimator  is  tuned  to  one  of  a  number  of  finite  estimates  of  the  unknown  phase  pa¬ 
rameter.  For  each  phase  estimate  (f>  we  will  generate  from  the  ideal  unitary  U (</>)  an  orthonormal 
basis  set  for  quantum  operations  on  the  combined  channel  ab.  A  quantum  process  tomography 
will  then  be  performed  by  solving  an  ^-norm  minimization  problem  (compressed  sensing)  to 
obtain  the  phase  estimate  dependent  process  matrix.  The  final  phase  estimate  is  selected  as  the 
one  with  the  smallest  (7 -norm  of  the  associated  process  matrix. 

If  this  approach  is  successful,  then  three  significant  benefits  would  immediately  accrue. 
First,  phase  estimation  would  be  accomplished  in  noisy  environments.  At  present  this  is  a  very 
difficult  task  [25] .  Secondly,  the  number  of  entangled  particles  might  be  greatly  reduced.  Lastly, 
this  may  also  reveal  an  alternative  to  the  phase  estimation  algorithms  proposed  for  the  Fourier 
transform  step  in  many  quantum  algorithms  [27,  §5]. 
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For  an  initially  well  designed  but  imperfect  quantum  information  system,  the  process  matrix  is  almost  sparse 
in  an  appropriate  basis.  Existing  theory  and  associated  computational  methods  {l\  -norm  minimization)  for 
reconstructing  sparse  signals  establish  conditions  under  which  the  sparse  signal  can  be  perfectly  reconstructed 
from  a  very  limited  number  of  measurements  (resources).  Although  a  direct  extension  to  quantum  process 
tomography  of  the  £i-norm  minimization  theory  has  not  yet  emerged,  the  numerical  examples  presented  here, 
which  apply  fi-norm  minimization  to  quantum  process  tomography,  show  a  significant  reduction  in  resources 
to  achieve  a  desired  estimation  accuracy  over  existing  methods. 


Quantum  process  tomography  (QPT)  refers  to  the  use  of 
measured  data  to  estimate  the  dynamics  of  a  quantum  system 
[1,  2],  Unfortunately,  in  the  general  case,  the  dimension  of 
the  parameter  space  for  QPT  can  be  prohibitive,  scaling  ex¬ 
ponentially  with  the  number  of  qubits.  This  in  turn  places  the 
same  burden  on  resources,  e.g.,  the  number  of  applied  inputs, 
measurement  outcomes,  and  experiments  to  achieve  a  desired 
accuracy,  as  well  as  estimation  computational  complexity.  A 
number  of  approaches  have  been  developed  to  alleviate  this 
burden.  Of  note  are  the  various  forms  of  ancilla  assisted 
QPT  (see  [3]  for  a  review),  and  the  use  of  symmetrisation 
to  estimate  selected  process  properties  [4],  Here  we  present 
a  method  which  can  be  used  either  alone  or  in  conjunction 
with  any  of  the  aforementioned  approaches.  The  underlying 
premise  is  that  for  an  intially  well  engineered  design,  the  ob¬ 
ject  that  describes  the  quantum  dynamics,  the  process  matrix, 
will  be  almost  sparse  in  the  appropriate  basis.  Certainly  in  the 
ideal  case  of  a  perfect  unitary  channel,  in  the  corresponding 
ideal  basis,  the  process  matrix  is  maximally  sparse,  i.e.,  it  has 
a  single  non-zero  element.  Since  environmental  interactions 
cannot  be  totally  eliminated,  the  actual  process  matrix  in  this 
ideal  basis  will  be  populated  with  many  small  elements,  and 
thus,  is  almost  sparse. 

These  are  the  conditions  under  which  methods  using  i\- 
norm  minmization  -  often  referred  to  as  Compressive  Sens¬ 
ing  -  are  applicable  [5,  6,  7],  Specifically,  for  a  class  of 
incomplete  linear  measurement  equations  (y  =  Ax,  A  £ 
Rmxn,  m  <C  n),  constrained  t\ -norm  minimization  (mini¬ 
mize  1 1  x  1 1 1  subject  to  y  =  Ax),  a  convex  optimization  prob¬ 
lem,  can  perfectly  estimate  the  sparse  variable  x.  These  meth¬ 
ods  also  work  very  well  for  systems  which  do  not  satisfy  the 
theoretical  conditions,  i.e.,  for  almost  sparse  variables  and 
with  measurement  noise. 

The  underlying  theory  of  t\  minimization  shows  that  under 
certain  conditions  on  the  matrix  A,  to  realize  perfect  recovery, 
the  number  of  measurements,  m,  scales  with  the  product  of 
the  log  of  the  number  of  variables  n  and  the  sparsity.  Since 
QPT  parameters  are  linear  in  probability  outcomes,  and  scale 
exponentially  with  the  number  of  qubits,  this  approach  heralds 
a  possible  linear  scaling  with  qubits.  The  theory,  however,  has 
not  as  yet  been  extended  to  QPT.  The  numerical  examples  here 
are  not  meant  to  lend  support  to  this  scaling  as  they  are  only 
presented  for  the  two-qubit  case.  The  examples  do,  however, 
show  more  than  an  order  of  magnitude  savings  in  resources 
over  a  standard  constrained  least-squares  estimation  using  a 


complete  set  of  measurements,  i.e.,  rank(A)  >  n. 

The  paper  is  organized  as  follows:  QPT  formalism  is  de¬ 
scribed  next,  followed  by  a  discussion  of  the  genesis  of  pro¬ 
cess  matrix  (almost)  sparsity.  A  form  of  the  i\  minimization 
for  QPT  is  then  presented  followed  by  numerical  examples 
and  some  concluding  remarks. 

QPT  Formalism. —  Recall  that  the  state-to-state  dynamics 
of  an  open  finite-dimensional  quantum  system  can  be  de¬ 
scribed  in  the  following  canonical  form  [1]: 

P  =  Ei=i  XaPTaPT\  (1) 

where  p,  p  £  C"xrl  are  the  input  and  output  state,  respec¬ 
tively,  of  dimension  n,  Xap  are  the  elements  of  the  n2  x  n2 
process  matrix  X,  and  the  matrices  Ta  form  an  orthonormal 
basis  set  for  n  x  n  complex  matrices: 

{Ta  £  Cnxn  | TV  T*  I>  =  5af 3,  a,  p  =  1, . . . ,  n2  }  (2) 

It  is  assumed  that  the  quantum  system  to  be  estimated  is  com¬ 
pletely  positive  and  trace  preserving  (CPTP).  The  set  of  feasi¬ 
ble  process  matrices  is  then  restricted  to  the  convex  set  [8,  9], 

X  >  0  (positive  semidefinite) 

En2  y  ,pt  p  _  T 

a, (3=1  ^-al3L  pi  a  — 

It  follows  from  (3)  that  the  number  of  real  parameters  in  the 
process  matrix  is  n4  —  n2.  For  q  qubits  n  =  2q,  hence,  scaling 
with  parameters  is  exponential  in  the  number  of  qubits. 

Collecting  data. —  A  common  method  for  collecting  data 
from  a  quantum  system  is  via  repeated  identical  experiments. 
Denote  by  *  =  l,...,nout  the  distinct  outcomes,  and  by 
k  =  l,...,ncfg  the  experimental  configurations,  e.g.,  any 
“knobs”  associated  with  state  inputs  and/or  measurement  de¬ 
vices.  The  measurement  outcomes  are  recorded  from  iden¬ 
tical  experiments  in  each  configuration  k  repeated  Nk  times. 
Let  Nik  denote  the  number  of  times  out  of  Ap  that  outcome 
i  occurred  in  configuration  k.  The  QPT  data  are  the  recorded 
outcome  counts, 

{Xik  \i  —  •  •  •  ?  rtout >  ^  I?  •  •  •  ?  ttcfg  }  (4) 

where  N  =  J2k=i  Nk  =  Efc=i  Yn=\  NU*  is  the  total  num‘ 
ber  of  experiments. 

Estimating  the  process  matrix. —  An  empirical  estimate  of 
the  probability  of  measuring  outcome  i  in  configuration  k  can 
be  obtained  from  (4)  as, 

PT  =  Nik/Nk  (5) 
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From  the  Bom  Rule  the  model  probability  of  outcome  i  given 
configuration  k  with  observable  Mik  is,  pn-  =  Tr  Mikpk > 

2  I 

where  from  (1),  pk  =  l  Xa0TaPk^p-  In  terms  of  the 

process  matrix  X ,  the  Born  rule  then  becomes, 

Pik{X)  =  Tr  GikX 

(Glk)a0  =  Tr  T^MikTaPk  (<1' 

The  ?routticfg  matrices  Gik  £  C"x"  capture  the  effect  of  mea¬ 
surements  in  the  matrix  basis  set  (2).  For  each  outcome  i,  the 
complete  set  of  configurations  is  the  combination  of  all  these 
matrices  and  the  input  states:  {pk,  GifejE-i . 

A  process  matrix  estimate  can  be  obtained  by  minimiz¬ 
ing  the  difference  between  the  empirical  probability  estimates 
p®“p  and  the  model  probabilities  pik(X)  subject  to  the  feasi¬ 
bility  constraint  (3).  Using  a  “least-squares”  measure  of  prob¬ 
ability  error  leads  to  estimating  the  process  matrix  by  solving 
the  optimization  problem: 

minimize  VLsPO  =  E  i,k  ( PikP  ~  Vik{X)f  (?) 

subject  to  A'  satisfies  (3) 

Because  the  outcomes  of  each  experiment  are  independent,  a 
maximum  likelihood  approach  can  also  be  considered,  i.e., 

minimize  VMl(X)  =  -  E i,k  Efc  \ogpik(X) 
subject  to  X  satisfies  (3) 

Both  (7)  and  (8)  are  convex  optimization  problems  with  the 
optimization  variables  being  the  elements  of  X  [8,  9].  The 
resulting  solution  (estimate)  will  always  be  CPTP  (3).  Unfor¬ 
tunately,  as  already  mentioned,  the  dimension  of  the  parame¬ 
ter  space  (n4  —  n2,  n  =  2q)  can  severely  strain  resources  to 
the  point  of  impracticality.  To  see  this  more  clearly,  let  the 
linear  relation  in  (6)  between  the  noutncfs  model  probability 
outcomes  and  the  n4  elements  of  the  process  matrix  be  repre¬ 
sented  by  an  noutticfg  x  n4  matrix  Q,  i.e., 

p  =  QX  (9) 

where  p,  X  are  vectors  formed  from  the  Pik  and  elements  of 
X,  respectively.  Accounting  for  the  n2  linear  constraints  in 
(3),  X  can  be  recovered  from  either  (7)  or  (8)  to  within  any 
desired  accuracy  by  using  enough  data  ( N  in  (4)  sufficiently 
large),  provided  thatrank(Cy)  >  noutticfg  >  n4  —  n2.  There¬ 
fore  it  would  seem  that  the  resources,  noutnc{g,  must  also 
scale  exponentially  with  the  number  of  qubits.  This,  however, 
is  not  the  case  when  the  process  matrix  is  almost  sparse  and 
where  the  sparsity  pattern  is  not  known[17]. 

Almost  sparsity  of  the  process  matrix. —  With  no  noise 
the  ideal  channel  p  — >  p  for  a  quantum  information  sys¬ 
tem  is  a  unitary,  i.e.,  p  =  U pU\  Let  {fa  £ 
denote  the  “Natural-Basis”  for  matrices  in  C”xra,  i.e.,  each 
basis  matrix  has  a  single  non-zero  element  of  one.  In  this 
basis,  the  process  matrix  associated  with  the  ideal  unitary 
channel  has  the  rank-1  form,  A;deai  =  xx'  with  x  £ 
Cn  ,  x'x  =  n.  A  singular  value  decomposition  (SVD)  gives 
A’ideai  =  Udiag(n,  0, . . .  ,0)Vt  with  V  £  Cnxn  a  unitary. 
An  equivalent  process  matrix  can  be  formed  from  the  SVD 


|X|  in  Natural-Basis 


|X|  in  Ideal/SVD-Basis 


(b) 


FIG.  1 :  Absolute  values  of  the  elements  of  the  process  matrix  X  £ 
Ci6xi6  for:  (a)  ideal  in  the  Natural-Basis;  (b)  ideal  in  Ideal/SVD- 
Basis;  (c)  actual  (pbf  =  0.05)  in  Natural-Basis,  (d)  actual  (pbf  = 
0.05)  in  Ideal/SVD-Basis;(e)  actual  (pbf  =  0.2)  in  Natural-Basis,  (f) 
actual  (pbf  =  0.2)  in  Ideal/SVD-Basis. 


in  what  is  referred  to  here  as  the  “Ideal/SVD-Basis,”  {Ta  = 

Ea'=i  Kt'affP  £  Cnxn}™2=1,  The  equivalent  process  ma¬ 
trix,  in  this  basis,  denoted  by  Ajdeah  is  maximally  sparse  with 
a  single  non-zero  element,  specifically,  (A;deai)ii  =  n.  As 
will  always  be  the  case,  the  actual  channel  will  be  a  perturba¬ 
tion  of  the  ideal  unitary.  If  the  noise  source  is  small  then  the 
process  matrix  in  the  nominal  basis  will  be  almost  sparse. 

Example:  Noisy  two-qubit  memory. —  Consider  a  system 
which  is  ideally  a  two-qubit  quantum  memory,  thus  U  = 
I 4 ,  n  =  4.  Suppose  the  actual  system  is  a  perturbation  of 
identity  by  independent  bit-flip  errors  in  each  channel  occur¬ 
ring  with  probability  pbf-  Forpbf  =  0.05  andpbf  =  0.2,  the 
respective  channel  fidelities  are  about  0.90  and  0.64,  which  for 
quantum  information  processing  would  need  to  be  discovered 
by  QPT  and  then  corrected  for  the  device  to  ever  work.  Refer¬ 
ring  to  Fig.l,  in  the  Natural-Basis,  Fig.  1(a),  the  ideal  16  x  16 
process  matrix  has  16  non-zero  elements  out  of  256,  all  of 
magnitude  one.  Using  the  Ideal/SVD-Basis  the  correspond¬ 
ing  process  matrix  as  shown  in  Fig.  1(b)  has  a  single  non-zero 
element  of  magnitude  n  =  4  -  it  is  clearly  maximally  sparse. 
Fig.l(c)-(d)  and  (e)-(f),  respectively,  show  the  effect  of  the 
two  pbf  levels  in  the  two  basis  sets.  In  the  Ideal/SVD-basis 
Fig.  1(d)  and  (f)  show  that  the  actual  (noisy)  process  matrices 
are  almost  sparse. 

Sparsity  minimization. —  A  known  heuristic  for  minimizing 
sparsity  without  knowing  the  sparsity  pattern,  and  also  accru- 
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ing  the  benefit  of  using  fewer  resources,  is  to  minimize  the  £\- 
norm  of  the  vector  of  variables  [5,  6,  9].  For  QPT  the  equiva¬ 
lent  t\  norm  is  defined  here  as  the  sum  of  the  absolute  values 
of  the  real  and  imaginary  parts  of  each  element  of  the  process 
matrix.  There  are  many  related  approaches  to  incorporate  this 
norm.  For  example,  an  estimate  of  X  can  be  obtained  by  solv¬ 
ing  the  following  convex  optimization  problem:  [18] 

minimize  ||X||^  =  X^a,/3=i(lRe  Xa/3|  +  |Im  Xa/3|) 
subject  to  V(X)  <  cr,  X  satisfies  (3) 

(10) 

with,  e.g .,  V(X)  from  (7)  or  (8).  The  optimization  parameter 
a  is  used  to  regulate  the  tradeoff  between  fitting  A'  to  the  data 
by  minimizing  V(X)  vs.  minimizing  the  sparsity  of  X  via  the 
T]  -norm.  Selecting  cr  is  often  done  by  averaging  V (X)  over  a 
series  of  surrogates  for  X  obtained  from  anticipated  scenarios 
or  iterating  estimation  and  experiment  design,  e.g.,  [8], 

In  the  examples  to  follow  we  use  the  modification  of  (10) 
suggested  in  [7],  referred  to  there  as  “t\  -reweightcd  mini¬ 
mization.”  In  this  approach  a  weighted  l\  -norm  is  used  with 
the  weights  determined  iteratively.  The  algorithm  described 
in  [7]  is: 

Initialize  cr>0,  e>0,  W  =  Ini 

Repeat 

1 .  Solve  for  X 

minimize  \\WX\\ei 

subject  to  V (A)  <  a,  X  satisfies  (3) 

2.  Update  weights 

W  =  diag  (1/ (|tci  |  +e),...,l/  (\xni  \  +  e)) 
x  =  X 

Until  convergence  -  the  objective  stops  decreasing  or  a  max¬ 
imum  number  of  iterations  is  reached. 

In  each  of  the  examples  to  follow  the  procedure  for  QPT  is: 
(i)  solve  (7)  to  obtain  Xu 2 ;  (ii)  set  cr  =  1.3T  ( Xg2 ) ;  (iii)  solve 
the  reweighting  algorithm  (1 1)-(12)  for  Xf  .t . 

Example:  QPT  of  noisy  two-qubit  memory. —  For  the  sys¬ 
tems  from  the  example  in  Fig.l,  the  inputs  and  measurements 
are  selected  from  the  set  of  two-qubit  states:  |a),  |  +  )  = 
(|a)  +  |&))/v/2,  |  — )  =  {\a)  —  i\b))/\/2  with  a,  b  =  1, . . . ,  16. 
Specifically,  the  available  set  of  states  are  the  16  columns  of 
the  matrices. 
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Considering  only  coincident  input/measurement  counts  [10], 
the  relevant  probability  outcomes  (6)  are, 

Pab(X )  =  g\bXgab,  X  e  C*xie  (M) 

(. gab)cx.  ot^bt  &  1?  •  •  •  5  16 

with  (j)a,(j>b{a,b)  £  {1, . . . ,  16}  the  selected  columns  of  (13). 

Fig. 2  shows  the  error  in  estimating  the  process  matrix 
AX  =  Xtrue  —  Xest  as  measured  by  the  RMS  matrix  norm 
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FIG.  2:  RMS  estimation  error  ||Xtrue  —  Xest||rms  vs.  number  of 
experiments  per  configuration:  selected  columns  of  (13).  Error  bars 
show  the  deviation  from  50  runs  at  each  setting. 

I2 -minimization  (□):  Xest  =  A72  is  from  (7)  using  all  16  in¬ 
put/output  combinations.  This  gives  a  matrix  G  £  c256x256  as  de¬ 
fined  in  (9)  which  is  full  rank,  i.e.,  rank (G)  =  256. 

£1  -minimization  (0):  Xest  =  Xix  is  from  (11)-(12)  using  6  in¬ 
puts  and  6  measurements  obtained  from  the  columns  of  the  second 
matrix  in  (13).  This  gives  G  £  c36x256  wp, jch  js  fup  rank; 
rank  (G)  =  36. 
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Xt2,G  £  C2 
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5  50  500 

experiments  per  configuration  (xIO3) 


(b)  »bf  =  0.20 


II  AX Hrms  =  (l/n)(rH;’  AX^AX)1/2  vs.  the  number  of  ex¬ 
periments  per  input  selected  from  the  set  (13)  [19].  The  results 
shown  are  from  simulations  described  in  the  caption. 

The  benefit  of  Q -minimization  compared  to  the  standard 
^-minimization  is  seen  most  clearly  with  small  amounts  of 
data  from  highly  incomplete  measurements.  For  example,  for 
Pbf  =  0.05  [Fig. 2(a)],  at  50  x  103  experiments  per  input  for 
the  6-input/6-output configuration  ((?  £  c36x256)  the  t\  RMS 
estimation  error  is  0.0019.  Compare  this  to  the  £2  error  of 
0.0012  at  500  x  103  experiments  per  input  for  the  16-input/16- 
output  configuration  (Q  £  c256x256).  The  latter  improvement 
can  be  attributed  mostly  to  the  10-fold  increase  in  the  number 
of  experiments  per  input.  The  additional  resources  to  achieve 
this  are  significant,  i.e.,  16  inputs  for  i2  vs.  6  for  l\ ,  and 
additionally,  an  increase  in  the  total  number  of  experiments 
from  6  x  50  x  103  to  16  x  500  x  103.  It  is  certainly  not  in¬ 
tuitive  that  to  estimate  the  240  parameters  of  the  process  ma¬ 
trix,  the  clearly  incomplete  set  of  measurements  using  only  36 
outcomes  (0  in  Fig. 2)  could  produce  results  not  only  similar 
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to,  but  for  each  number  of  experiments  per  input,  even  better 
than  the  full  input  case  with  all  256  combinations  of  inputs 
and  measurements  (□  in  Fig. 2).  As  seen  the  £\  error  is  about 
1/2  the  li  error.  Also,  reweighting  reduced  the  (unweighted) 
tx  error  by  1/2-1/3. 

Comparing  the  estimation  errors  with  the  error  between  the 
actual  and  ideal  (solid  lines  in  Fig.2)  suggests  that  at  least 
50  x  101 2 3 4 5 6 7 8 9 10 11 12  experiments  per  input  are  needed  to  achieve  a  suf¬ 
ficient  post-QPT  error  correction  towards  the  ideal  unitary. 
Fig.2  also  reveals  that  the  estimation  errors  are  very  similar 
for  both  levels  of  bit- flip  error,  pbf  €  {0.05,0.20}.  This  is  ex¬ 
plained  by  the  Cramer-Rao  bound  which  defines  the  asymp¬ 
totic  error  of  any  unbiased  estimator,  i.  e. ,  the  RMS  decays  as 
A/y/N.  Here  A  is  effectively  the  error  between  the  empirical 
(5)  and  actual  (6)  probabilities  which  by  definition  is  of  order 
one;  this  provides  a  reasonable  fit  to  the  data  in  Fig.2. 

Infinite  data. —  With  infinite  data  the  measurements  are 
effectively  noise-free,  so  the  empirical  probability  estimates 
are  equivalent  to  the  true  probabilities.  Infinite  data  esti¬ 
mates  are  obtained  by  solving  (7)  and  (1 1)-(12)  with  the  con¬ 
straint  V(X)  <  a  replaced  by  the  linear  equality  constraint 
Pik(X)  =  Pik(X true)-  For  the  numerical  examples  here,  (14) 
gives  the  linear  equality  g\b{X  -  Xtlue)gab  =  0. 

In  the  examples,  both  Xix  from  (11)-(12)  and  Xp2  from 
(7)  were  numerically  equal  to  Atnle.  This  is  to  be  expected 
for  Xe2  because  of  the  complete  set  of  256  full  rank  mea¬ 
surements.  Almost  sparsity  makes  perfect  estimation  possible 
with  the  highly  incomplete  set  of  36  measurements. 

The  infinite  data  case  is  useful  for  evaluating  different  con¬ 
figuration  strategies  in  simulation,  i.  e. ,  consider  only  those 
that  result  in  a  good  estimate. 

To  stress  the  efficacy  of  i\ -minimization  as  a  heuristic  for 
sparsity,  consider  replacing  the  £\  norm  in  (1 1)-(12)  with  the 


RMS  norm  ||Aj|rms,  which  is  effectively  the  £ 2  norm  of  X. 
Solving  the  6-input/6-output  case  (0  in  Fig.2)  for  pbf  =  0.05 
with  infinite  data  gives  an  RMS  error  of  0.11,  which  is  con¬ 
siderably  larger  than  the  error  between  the  actual  and  ideal 
of  0.03  (solid  line  in  Fig. 2(a)).  The  estimate  gets  even  worse 
with  finite  data.  This  again  emphasizes  the  advantage  of  £\ 
minimization  for  sparse  signal  reconstruction  [5,  6]. 

Conclusions. —  The  use  of  the  £ i-norm  minimization  meth¬ 
ods  of  Compressive  Sensing  [5,  6,  7]  appear  to  apply  equally 
well  to  sparse  QPT.  The  examples  of  sparse  process  matrices 
presented  here  are  meant  to  represent  typical  initial  imperfect, 
designs.  The  numerical  results  illustrate  how  estimation  re¬ 
source  tradeoffs  can  be  obtained.  Additionally,  the  findings 
suggest  that  QPT  resources  need  not  scale  exponentially  with 
qubits.  In  the  ideal  case,  the  theoretical  question  of  showing 
linear  scaling  with  sparsity  using  t  \  minimization  for  QPT  re¬ 
mains  open. 

Because  £\  minimization  uses  considerably  fewer  resources 
than  standard  QPT,  use  in  an  on-line  setting  combined  with 
optimal  quantum  error  correction  tuned  to  the  specific  QPT 
errors  is  compelling,  e.g.,  [11,  12,  13],  Another  future  direc¬ 
tion  is  in  conjunction  with  Hamiltonian  parameter  estimation. 
Here  a  bank  of  estimators  can  be  applied  to  the  data  where 
each  estimator  is  tuned  via  the  Ideal/SVD-Basis  to  one  of  a 
number  of  finite  samples  of  the  unknown  parameters.  Such  an 
approach  may  prove  useful  for  a  small  number  of  parameters. 
In  quantum  metrology  often  a  single  uncertain  parameter  is  to 
be  estimated  in  an  unknown  noisy  environment,  e.g.,  [14,  15]. 
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Understanding  and  controlling  the  world  at  the 
nanoscale — be  it  in  biological,  chemical  or  physical 
phenomena — requires  quantum  mechanics.  It  is  therefore 
essential  to  characterize  and  monitor  realistic  complex 
quantum  systems  that  inevitably  interact  with  typically 
uncontrollable  environments.  One  of  the  most  general 
descriptions  of  the  dynamics  of  an  open  quantum  system 
is  a  quantum  map — typically  represented  by  a  process 
matrix  [1].  Methods  to  identify  this  matrix  are  collectively 
known  as  quantum  process  tomography  (QPT)  [1,2].  For  a 
d-dimensional  quantum  system,  they  require  Old4)  experi¬ 
mental  configurations:  combinations  of  input  states,  on 
which  the  process  acts,  and  a  set  of  output  observables. 
For  a  system  of  n  qubits — two  level  quantum  systems — 
d  =  2".  The  required  physical  resources  hence  scale 
exponentially  with  system  size.  Recently,  a  number  of 
alternative  methods  have  been  developed  for  efficient  and 
selective  estimation  of  quantum  processes  [3].  However, 
full  characterization  of  quantum  dynamics  of  comparably 
small  systems,  such  as  an  8-qubit  ion  trap  [4],  would  still 
require  over  a  billion  experimental  configurations,  clearly 
impractical.  So  far,  process  tomography  has  therefore 
been  limited  by  experimental  and  off-line  computational 
resources,  to  systems  of  2  and  3  qubits  [5-7]. 

Here  we  adapt  techniques  from  compressive  sensing 
to  develop  an  experimentally  efficient  method  for  QPT.  It 
requires  only  0(s  logd)  configurations  if  the  process  matrix 
is  s  compressible  in  some  known  basis,  i.e.,  it  is  nearly 
sparse  in  that  it  can  be  well  approximated  by  an  .^-sparse 
process  matrix.  This  is  usually  the  case,  because  engi¬ 
neered  quantum  systems  aim  to  implement  a  unitary 
process  which  is  maximally  sparse  in  its  eigenbasis. 
In  practice,  as  observed  in  liquid-state  NMR  [8],  photonics 
[5,9,10],  ion  traps  [11],  and  superconducting  circuits  [6], 


PACS  numbers:  03.65.Wj,  03.65.Yz,  03.67.Lx 


a  near-unitary  process  will  still  be  nearly  sparse  in  this 
basis,  and  still  compressible.  The  near  sparsity  is  due  to 
few  dominant  system  environment  interactions.  This  is 
more  apparent  for  weakly  decohering  systems  [12], 

We  experimentally  demonstrate  our  algorithm  by  esti¬ 
mating  the  240  real  parameters  of  the  process  matrix  of  a 
canonical  photonic  two-qubit  gate,  Fig.  1,  from  a  reduced 
number  of  configurations.  From  just  18  and  32  configura¬ 
tions,  we  obtain  fidelities  of  94%  and  97%  with  process 
matrices  obtained  from  an  overcomplete  set  of  all  576 
available  configurations. 

Compressive  sensing  provides  methods  for  compression 
of  information  carried  by  a  large-size  signal  into  a  signifi¬ 
cantly  smaller  one  along  with  efficient  convex  opti¬ 
mization  algorithms  to  decipher  this  information  [13]. 
Originally  developed  to  exploit  compressible  features  of 


FIG.  1  (color  online).  Experimental  scheme.  Two-photon  in¬ 
puts  were  prepared  with  either  (a)  a  high-rate,  nonscalable,  two- 
photon  source  or  (b)  a  low-rate,  scalable,  four-photon  source. 
The  qubits  are  encoded  using  polarization,  as  described  in  the 
text.  The  quantum  process  is  a  photonic  entangling  gate.  A 
measurement  configuration  is  defined  as  some  combination  of 
state  preparation  and  an  observable,  implemented  here  with 
quarter-  and  half-wave  plates  (QWP,  HWP),  polarizers  (PBS), 
and  photon  detectors  (APD).  For  details  see  [19]. 
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audio  and  video  signals,  compressive  sensing  is  now  ap¬ 
plied  to:  simulations  of  compressive  sensing  for  QPT  [14], 
ghost  imaging  [15],  and  state  tomography  for  low-rank 
density  matrices  [16].  The  latter  provides  a  quadratic  re¬ 
duction  of  physical  resources  from  d 2  for  standard  tomog¬ 
raphy  to  0{rd\og2d )  for  a  density  matrix  of  rank  r  with  the 
added  advantage  that  rank  is  basis  independent.  Recently, 
this  method  has  been  useful  in  efficient  state  tomography 
of  one-dimensional  systems  approximated  by  matrix  prod¬ 
uct  states  [17]. 

Under  reasonable  assumptions,  a  quantum  map  on  a 
d-dimensional  space  has  the  general  representation  [1], 

<P 

S{p)  =  X  (1) 

a,P=\ 

where  the  d2  X  d2  process  matrix,  is  positive  semi- 
definite,  x  —  0,  and  trace  preserving,  = 

ld,  with  {Ta}  an  orthonormal  matrix  basis  set,  Tr(l  ^r„)  = 
8ap.  Note  that  sparsity  is  a  property  of  the  map  represen¬ 
tation  not  the  map  itself.  Data  is  collected  by  preparing 
an  ensemble  of  identical  systems  in  one  of  the  states 
{pi, . . . ,  pk},  inputting  them  to  the  process  x>  and  then 
measuring  an  observable  M  from  the  set  { /V/ , , . . .,  Mt}. 
For  a  pair  (p,  M),  the  outcome  will  be  yM  = 
Tr {S{p)M).  If  the  experiment  is  repeated  for  all  configu¬ 
rations,  i.e.,  (p,-,  Mt),  i  =  1, . . .,  m  =  M,  the  relation  be¬ 
tween  the  vector  of  outcomes  y  =  [yMl,Pl.  •  •  •>  yMm,Pmf 
and  the  true  process  matrix,  denoted  by  xo*  can  be  repre¬ 
sented  by  a  linear  map  y  =  fh^n,  where  xo  is  the  vector¬ 
ized  form  of  the  process  matrix  xo  and  $  is  an  m  X  d4 
matrix  of  coefficients  of  the  form  Tr(  k„p,  I  -Jm. 

In  general,  estimating  a  sparse  process  matrix  with  an 
unknown  sparsity  pattern  from  an  underdetermined  set 
of  linear  equations  (m  <  d4)  would  seem  highly  unlikely. 
Compressive  sensing,  however,  tells  us  that  this  can  be  done 
by  solving  for  x  from  the  convex  optimization  problem: 

minimize  ||*||€l  subject  to  ||y  -  <&x\\e2  -  £>  (2) 


and  positive-semidefinite  and  trace-preserving  conditions 
as  defined  above.  The  parameter  e  quantifies  the  level 
of  uncertainty  in  the  measurements,  that  is,  we  observe 
y  =  +  w  with  ||vv||^2  <  e.  From  [18],  recovery  via 

(2)  is  ensured  if  (i)  the  matrix  O  satisfies  the  restricted 
isometry  property: 


1  -  5  ,. 


II^YiOO  -  ®X2(s)\\\ 
IIyiO)  -  U2<»llf2 


(3) 


for  all  .v-sparse  ^  (.v),  Xi(s)  process  matrices;  (ii)  the 
isometry  constant  82s  <  \fl  —  1  and  (iii)  the  number  of 
configurations  m  >  C(l.v  ]og(z74/.v).  Under  these  conditions, 
the  solution  x *  °f  (2)  satisfies, 

\\X*  ~  Xo\\e2  ^  “  Uollr,  +  C2s  (4) 


where  ^(.v)  is  the  best  .v-sparse  approximation  of  xo  and 
Cq,  Ci,  C2  are  constants  on  the  order  of  C(5S),  see  [19]. 
The  restricted  isometry  property  states  that  two  .v-sparse 
process  matrices  xi  (3’)  and  Xi(s)  can  he  distinguished  if 
their  relative  distance  is  nearly  preserved  after  the  measure¬ 
ments.  If  the  measurements  are  noise  free,  e  =  0,  and  xo 
is  s  sparse,  Xu  =  Uo(5)>  then  the  right-hand  side  of  (4) 
is  zero  leading  to  perfect  recovery,  x*  =  To-  Otherwise 
the  solution  tends  to  the  best  s-sparse  approximation  of 
the  process  matrix  plus  the  additional  term  due  to  measure¬ 
ment  error  s.  If  for  an  n-qubit  QPT  with  d  =  2"  the  con¬ 
ditions  of  the  above  analysis  are  satisfied,  then  the  number 
of  experimental  configurations  m  scales  linearly  with  sn , 
specifically,  m  >  C0i(4;7log2  —  logs)  =  0{sn).  In  [19], 
using  the  measure  concentration  properties  of  random  ma¬ 
trices,  following  the  arguments  in  [20],  we  show  that  if  $  is 
constructed  from  random  input  states  {p,},  and  random 
observables  {/W,},  then  the  restricted  isometry  in  (3)  holds 
with  high  probability.  Also  a  test  is  presented  to  certify 
the  sparsity  assumption. 

A  nearly  sparse  process  matrix  can  be  recovered  from  an 
exponentially  smaller  number  of  measurement  outcomes 
to  within  the  bounds  of  (4)  by  solving  (2).  We  now  test  our 
algorithm  experimentally  against  standard  QPT  on  a  two- 
qubit  gate  under  a  range  of  decoherence — and  thus  spar¬ 
sity — conditions.  We  used  a  photonic  controlled-phase, 
CZ,  gate,  Fig.  1  where  the  qubits  are  encoded  in  orthogonal 
polarization  states  of  single  photons  (|//),  horizontal, 
and  |  V),  vertical).  We  performed  full  process  tomography 
[5,9,10]  of  the  gate  with  both  two-photon  and  four-photon 
arrangements,  preparing  16  pairwise  combinations  of 
the  4  input  states  {|/7),  |V),  | D),  |R)}  and,  for  each  input, 
measuring  36  two-qubit  combinations  of  the  observables 
{I/O,  in  I D),  \A),  | R),  |L>},  where  |D(A)>  =  (| H  > 
±|V>)/\/2  and  \R(L)>  =  (\H  >  ±i|V>)/v/2.  These 
576  input-output  configurations  represent  an  overcomplete 
set  which  allows  the  best  possible  estimate  of  the  quantum 
process,  denoted  xsi(>  [5]. 

The  compressed  quantum  process  tomography  (CQPT) 
estimate  of  the  16  X  16  process  matrix,  xm->  is  obtained  by 
solving  (2)  with  v  =  Cseip  and  =  CseiG  where  p  is  the 
vector  of  576  experimental  probabilities  corresponding  to 
each  of  the  576  configurations,  G  is  the  576  X  256  matrix 
obtained  from  all  the  configurations  with  the  basis  set  {!'„}, 
and  Csel  is  the  m  X  576  matrix  corresponding  to  taking  a 
selection  of  m  <  576  of  all  possible  configurations.  The 
basis  set  is  obtained  from  the  singular-value  decomposition 
of  the  ideal  CZ  gate:  the  process  matrix  in  this  basis  is 
maximally  sparse  with  a  single  nonzero  (1,1)  element.  The 
measurement  error  bound  s  in  (2)  is  chosen  to  be  just 
slightly  larger  than  Jmcr,  where  a  is  the  minimum  feasible 
root-mean- square  level  obtained  from  (2)  using  all  con¬ 
figurations,  i.e.,  with  Cse]  =  /57ft.  We  quantify  decoherence 
using  the  process  purity,  T  =  Tr(xl,/d2),  which  varies 
from  0  for  a  completely  decohering  channel,  to  1 
for  a  unitary  process:  in  our  experiment  we  used  six 
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FIG.  2  (color  online).  Process  fidelities  vs  number  of  input- 
output  configurations,  for  each  compressive  QPT  estimate,  x,n, 
in  the  gate  basis  of  the  ideal  CZ  gate  for  the  lowest  measured 
noise  level,  T  =  0.91.  The  dashed  line  shows  the  fidelity  of  the 
full  estimate  T{UCZ,  Xsie)  =  0.89  (black  diamond).  Error  bars 
are  obtained  by  solving  (2)  for  50  different  random  combinations 
of  m  inputs  and  observables. 
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decoherence  levels  (see  [19]  for  details),  giving  purities 
of  {0.62,  0.74,  0.77,  0.79,  0.86,  0.91}  ±  0.01. 

Figure  2  shows,  for  the  lowest  decoherence  level, 
the  process  fidelities  [5]  versus  the  number  of  randomly 
selected  configurations,  m.  Each  process  matrix,  {xm}, 
is  obtained  by  solving  (2).  We  use  the  fidelity  bet¬ 
ween  (i)  the  compressive  measurement  and  the  ideal, 
f{Ucz,  x,,,)',  and  (ii)  the  compressed  and  optimal  measure¬ 
ments,  TiXsifr  Xm)-  Note  that  as  m  increases  the  fidelity 
with  the  ideal  converges  to  the  value  of  0.89  obtained 
from  X5K>',  likewise,  the  fidelity  with  the  full  estimate 
converges  to  unity.  Similar  plots  exist  for  every  level  of 
decoherence,  with  fidelities  reduced  accordingly. 

We  have  so  far  used  random  selections  of  probabilities 
from  the  full  data  set,  which  allows  us  a  comprehensive 
test  of  CQPT.  Experiments,  however,  do  not  yield 


probabilities  but  physical  quantities,  e.g.,  count  rates.  To 
date,  algorithms  for  more  efficient  state  [16]  or  process 
tomography  have  assumed  probabilities  as  a  starting  point. 
Since  normalization  is  an  issue  to  some  extent  in  all 
physical  architectures,  it  will  be  necessary  to  investigate 
the  robustness  and  scalability  of  algorithms  for  real-world 
experiments. 

For  our  photonic  two-qubit  gate,  which  is  lossy  and 
intrinsically  probabilistic,  the  probabilities  were  obtained 
by  normalizing  counts  using  a  full  basis  set  of  observables 
extracted  from  all  measurements,  /576.  Having  sufficient 
configurations  to  allow  for  normalization  necessarily  im¬ 
poses  limits  on  CQPT  efficiency:  for  low  m,  we  are  re¬ 
stricted  in  how  random  our  selections  can  be.  (Details  and 
some  permissible  configurations  in  [19]).  As  an  example. 
Fig.  3  shows  process  matrices  reconstructed  via  CQPT 
from  just  one  of  these  configurations  compared  to  the 
respective  full  data  estimates.  We  used  32  combinations 
of  the  16  inputs  {|//),  |  V),  \ D),  |R)}®2  and  2  observables 
{|R)|7),  |/)|R)},  where  /  is  the  identity.  The  agreement  is 
excellent  as  one  can  see  from  the  fidelities  and  the  correct 
reproduction  of  imaginary  elements — which  are  ideally 
zero.  Another  striking  feature  is  that  we  obtain  highly 
faithful  reconstructions  of  a  nonlocal  process  using  only 
local  measurements  [2], 

A  further  crucial  test  is  whether  CQPT  enables  us  to 
locate  errors  and  implement  necessary  corrections:  a  com¬ 
mon  example  is  identifying  local  rotations  that  move  the 
process  closer  to  the  ideal.  By  optimizing  {F(LCZ,  Xu), 
we  calculated  local  corrections  to  Xyi’,  applying  them  to 
the  full  estimate  ^575,  {F(Ccz,  Xsi6)  improved,  on  average, 
over  all  decoherence  levels,  by  4.1%.  This  is  very  close  to 
the  average  4.9%  improvement  obtained  by  calculating 
and  applying  local  corrections  directly  to  Xyi(,-  Even  a 
low-configuration  CQPT  estimate  of  a  noisy  process  there¬ 
fore  enables  improvements. 


FIG.  3  (color  online).  Real  and  imaginary  process  matrix  elements  in  the  Pauli  basis  for  the  CQPT  estimate  Xn ,  32  configurations 
(left)  vs  full  data  estimate  ^576,  576  configurations  (right)  for  (a)  a  low  noise,  two-photon  experiment,  T  =  0.91,  and  (b)  a  high-noise, 
four-photon  experiment,  T  =  0.62.  The  CQPT  reconstructions  have  fidelities,  T(x 576,  Xn),  °f  95%  and  85%,  respectively.  The  CQPT 
estimation  accuracy  is  excellent  for  low  noise,  and  reliable  even  for  high  noise,  see  [19]  for  more  details. 
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FIG.  4  (color  online).  Absolute  values  of  the  256  process 
matrix  elements  of  f°r  our  lowest  and  highest  noise  level, 
sorted  by  relative  magnitude  [with  respect  to  the  (1,1)  element] 
in  the  CZ  basis  (top)  and  the  Pauli  basis  (bottom).  The  error 
threshold,  which  indicates  the  required  number  of  configura¬ 
tions,  is  shown  in  grey. 

That  high-fidelity  estimates  are  obtained  by  CQPT  can  be 
understood  from  the  error  bound  (4)  which  shows  that  the 
CQPT  estimate  tends  towards  the  best  .v-sparse  approxima¬ 
tion  of  the  true  process,  Xsi(>-  Figure  4  shows  the  process 
matrix  elements,  sorted  by  relative  magnitude,  for  low-  and 
high-noise  levels,  in  two  basis  sets.  The  .v-sparse  approxi¬ 
mation  levels  indicated  in  (4)  are  reached  where  the  matrix 
elements  drop  below  the  error  threshold  (0.0 1-0.02).  For  the 
corresponding  m,  we  can  therefore  expect  a  successful, 
high-fidelity,  CQPT  reconstruction.  In  the  CZ  basis,  the 
plots  show  that  for  low  noise,  s  G  [20,  30],  which  correlates 
well  with  the  fidelities  in  Fig.  2;  for  high  noise  s  G  [40,  60]. 
Although  the  process  matrix  is  still  somewhat  sparse  in  the 
Pauli  basis  (Fig.  3).  Figure  4(b)  indicates  that  —100  con¬ 
figurations  are  needed  to  obtain  an  estimate  of  comparable 
quality.  Furthermore,  the  sorted  magnitude  values  in  the  CZ 
basis  decay  exponentially,  which  is  sufficient  to  declare 
the  process  matrix  s  compressible,  see,  e.g.,  [21,22].  In- 
triguingly,  this  exponential  decay  is  a  signature  of  model- 
based  compressive  sensing  where  the  scaling  goes  from 
m  =  0(s  log(r//.v))  to  m  =  0(s)  [22].  This  demands  further 
investigation,  since  it  appears  that  QPT  fits  this  framework, 
particularly  when  the  process  matrix  is  expanded  in  the 
ideal  basis  corresponding  to  the  unitary  design  goal. 

Our  experimental  results  are  supported  numerically 
by  simulations  of  a  2-qubit  process  as  well  as  simulation 
studies  for  3-  and  4-qubit  systems  which  show  the  same 
type  of  compressibility,  see  [19].  Applying  CQPT  to  larger 
systems  will  require  careful  attention  to  classical  postpro¬ 
cessing  which — as  in  QPT — scales  exponentially.  The 
standard  software  we  used  here  (see  [19]),  can  easily 
handle  2-  and  3-qubit  CQPT  systems.  For  larger  systems, 
more  specialized  software  can  increase  speed  by  orders  of 
magnitude,  see,  e.g.,  [21]. 

A  number  of  research  directions  arise  from  this  work: 
incorporating  knowledge  of  model  structure  properties; 
tightening  the  bounds  on  scaling  laws;  understanding 


how  near-sparsity  s  and  rank  r  vary  with  system  dimen¬ 
sion,  d;  pursuing  highly  efficient  convex-computational 
algorithms;  and  selection  of  optimal  configurations. 
Compressive  tomography  techniques  can  also  be  applied 
to  quantum  metrology  and  Hamiltonian  parameter  estima¬ 
tion:  for  example,  estimating  selective  properties  of 
biological  or  chemical  interest  in  molecular  systems  and 
nanostructures  with  typically  sparse  Hamiltonians  [23], 
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APPENDIX  A:  NORMS 


Definitions  of  the  norms  used  throughout  the  paper.  For  a 
vector  x  £  C", 


,4 

for  all  x  £  Cd  ,  where  8S  £  (0, 1)  and  C3(<5S)  only  depends 
on  8S ,  then  $  satisfies  the  RIP, 

(1  -  Ss )  ||a:s|||  <  ||<Ecs|||  <  (1  +  58)  ||zs||*2  (B2) 


\\x\\i2  =  Vxh:=  VEUi  W2 

11*11/,  =  E"=t  W- 


(Al) 


For  a  matrix  A  £  Cmxn  with  rank(A)  =  r  <  min{ro,  n} 
and  singular  values  <j\  >  ct2  >  ■  ■  ■  >  or  >  0, 


Induced  £2  norm  ||A||2  =  supy^^  ||Ax||^  =  ax 

Frobenius  norm  ||A||fro  =  \/Tr(A^A)  =  y/ELi  <% 

Nuclearnorm  ||A||+  =  Tr(\/ At  A)  =  E[=i  °» 

(A2) 

In  the  main  theorem  of  CQPT  we  evaluate  the  dis¬ 
tance  between  the  vectorized  form  of  two  process  matrices, 
||X2  —  It  is  interesting  to  see  what  is  the  relation  be¬ 

tween  this  distance  and  a  more  natural  measure  of  distance 
between  two  maps.  A  commonly  used  definition  of  distance 
between  two  quantum  maps  S2  and  (Si  is  |  |«S2  —  Si||  = 
dsup^  B  \tr[AS‘2(B)  —  A(Si(B)]|  for  all  matrices  A  and  B 
such  that  ||A||f  =  ||.B||f  =  1.  This  can  be  equiva¬ 

lently  expressed  as  dsup^  B  |tr[(T2  —  t\)A  ®  B]\  where  ra 
(a  =  1,  2)  is  the  Jamiolkowski  state  equivalence  of  Sa  defined 
as  \  N)0‘l®*5a(|i)0‘|).  The  distance  |  |(S2 -<Si||  is  upper 

bounded  by  d\  |r2  —  r\  |  |*  [4],  Using  the  orthonormality  proper¬ 
ties  of  the  basis  ra  one  can  show  ||r2— Ti | |fro  =  1 1 X2 — Xillfro- 
Finally  we  can  use  1 1  A\ |*  <  -^/t  1 1  A|  |fro  (r  is  the  rank  of  matrix 
A)  to  find  ||x2  -  Xillz,  >  ^dll«S2-«Si||. 


APPENDIX  B:  RESTRICTED  ISOMETRY  PROPERTY 
FROM  A  CONCENTRATION  INEQUALITY 

A  common  approach  to  establish  the  restricted  isometry 
property  (RIP),  Eqn.(3)  in  the  paper,  for  a  matrix  A  £  Cmxn 
with  m  <  n  is  by  introducing  randomness  in  the  elements 
of  this  matrix.  This  approach  benefits  from  measure  concen¬ 
tration  properties  of  random  matrices.  For  QPT  for  the  mea- 

,4 

surment  matrix  <1»  £  Cm  x  in  Eqn.(2)  of  the  paper,  we  show 
how  to  achieve  this  with  random  preparation  of  the  intial  states 
and  a  random  selection  of  the  measurement  operators.  The 
proof  is  based  on  the  results  in  [5]  which  show  that  if  4>  is  a 
random  matrix  which  satisfies  the  concentration  property, 

Pr  {|  \\<S>x\\l  -  \\x\\l  |  >  Ss  \\x\\l}  <  2e~2mC^\  (Bl) 


,4 

for  all  s-sparse  xs  £  C  .  This  version  of  RIP  is  equivalent 
to  Eqn.(3)  in  the  paper. 

In  classical  signal  processing,  each  element  of  the  <1>  ma¬ 
trix  can  be  independently  selected  from  a  random  distribution 
such  as  Gaussian  or  Bernoulli.  For  QPT  there  is  no  freedom 
for  random  independent  selection  of  every  element  of  the  <I> 
matrix.  However,  as  described  in  the  paper,  the  rows  of  <1> 
can  be  independently  and  randomly  selected.  To  see  this,  re¬ 
call  that  for  each  experimental  configuration  we  can  initial¬ 
ize  the  system  randomly  in  a  state  p  £  {pi  £  Cdxd}i=1 
and  then  measure  an  observable  M  randomly  chosen  from 
{Mj  £  Cdxd}j=1.  The  corresponding  matrix  4>  then  has 

m  =  k£  independent  random  rows  {<f>\  £  ClxN}1P=1  with 
correlated  elements  of  each  row  since  they  are  functions  of 
the  same  M  and  p.  Observe,  however,  that  although  $  is  a 
random  matrix,  because  it  is  constructed  from  quantum  states 
and  observables  of  a  finite  dimensional  system,  it  is  bounded. 

,4 

As  a  consequence,  \/x  £  C  ,  we  get, 

(we/m)  \\x\\l2  <  xt(<t>i(j>\)x  <  ( wu/m )  \\x\\2t2 

^  11*11/5,  <  K\\Ax\\l  <  u\\x\\l 

where  E  denotes  expectation  with  respect  to  $  and 
wu,we,u,£  are  constants.  Next  we  apply, 

Hoeffding’s  concentration  inequality  Let  v\ .  ...,vr„  be 
independent  bounded  random  variables  such  that  Vi  falls  in 
the  interval  [a,,  h,]  with  probability  one.  Then  for  S  =  v, 
and  any  t  >  0  we  have, 

Pr  {S'  —  E(S')  >  t}  <  e-2t2/£.(b'-a*)2 
Pr{S- E(5)  <  -1}  <  e-2t2/^^b'~a^2 

In  our  problem  Uj  =  |^»|a;|2  and  S'  =  ||$x||^  .  From  the  above 
inequalities  and  the  relations  in  (B3)  we  find  Vt+,  t_  >  0  and 

\/x, 

Pr{s~u\\x\\l  >4}  <  Pr{S-E(S)  >t+} 

<  e-2t2+/(wu-we)2 

Pr{s-I||a:||Ja  <  <  Pr  {S  -  E(S)  <  -t_} 

<  g-2  t2_/(wu-we)2 


(B5) 


2 


The  choice  of  t+  =  (<5S  +  1  —  u)  \\x\\2^  and  £_  =  [i  —  1  + 

6 a)  ||x||^  in  the  above  inequalities  yields 

Pr  {|  \\<f>X\\l  -  \\xflT  |  >  Ss  \\X\\1}  <  2 e-2m(Ss+er/(Wu-We) 

(B6) 

with  e  =  min{l  —  u,  i  —  1}.  We  also  need  t+  and  t_  to  be 
positive  that  imposes  the  condition  1  —  6S  <  £  <  u  <  1  +  5S. 
Since  the  obervable  M  can  be  scaled  by  any  real  factor,  a 
sufficient  condition  is  u/i  <  (1  +  ) / ( 1  —  (5S). 

Next  we  reproduce  the  connection  between  the  measure 
concentration  (B6)  and  restricted  isometry  as  demonstrated  in 
[5]:  Let  Xs  be  a  set  of  vectors  with  cardinality  s:  #(2fs)  =  s. 
We  choose  a  set  Y  C  Xs  such  that  \\y\\e2  =  1  for  all  y  £  Y , 
we  have  min^y  \\x  —  y \\(2  <  5s/4  for  all  x  £  Xs.  The  car¬ 
dinality  of  such  a  set  Y  can  always  be  chosen  to  be  smaller 
than  (12 /5S)S  [6].  There  from  (B6)  we  find 

Pr  ||  \m\l  -  1|  >  &,t 2}  <  2{12/8s)se 

or  equivalently  1  —  Ss/2  <  ||$t/||^2  <  1  +  Ss/2  holds  with 
probability  exceeding 

P  =  1  -  2(12/ <5S)S  exp(— 2m(5s/2  +  e)2/(wu  -  we)2). 


To  present  all  the  distances  based  on  / 1  -norm  we  can  use 
||y||ii  <  \\y\\h  <  V/£>||t/||;1,  for  a  D-dimensional  vector  y 
and  obtain  the  algorithm  performance  as 

llx*  -  Xo||^  <  - ||xo(s)  -  Xo||^  +  d2C2£  (C2) 

Vs 

However  the  performance  inequality  presented  in  the  paper 
has  a  tighter  bound. 


APPENDIX  D:  SPARSITY  ASSUMPTION  CERTIFICATION 

A  test  to  certify  the  sparsity  assumption  can  be  concluded 
from  (C2)  and  the  probability  of  RIP  being  satisfied  exceed¬ 
ing  1  —  e_mC'3(<5s)  for  to  configurations.  Suppose  an  esti¬ 
mate  Xm  is  obtained  for  m  configurations.  If  the  measure 
||Xm+i  —  Xmllry  which  quantifies  an  incremental  improve¬ 
ment  in  the  estimated  process  matrix,  converges  toward  zero 
for  a  polynomially  large  to,  the  sparsity  assumption  is  certi¬ 
fied. 


APPENDIX  E:  NORMALIZATION  AND  PRECISION  ISSUES 


Define  z  to  be  the  smallest  number  such  that  \\<&x'  lk<l  +  * 
for  all  x'  with  ||a;,||^2  =  1.  For  a  vector  y  £Y  we  have, 

IMU  <  II +  ll*(*'  -  y)\\i2  <  i  + 1  +  (i  +  z)j 

from  which  it  follows  that  z  <  Ss,  for  any  0  <  Ss  <  1. 
In  a  similar  fashion  we  can  prove  1  —  Ss  <  ||<I>;r/||f2.  This 
completes  the  proof  that  RIP  (B2)  holds  with  probability  ex¬ 
ceeding  P  for  all  x  £  Xs.  The  number  of  sets  Xs  with 
#XS  =  s  is  ('Jj  <  ( eN / s)s .  Therefore  RIP  fails  to  be  sat¬ 
isfied  with  probability  2  exp(— 2m(5s/2  +  e)2/ (wu  —  W()2  + 
s[log(eN/s )  +  log(12/ds)]).  For  a  sufficiently  small  constant 
Co,  if  CoS  <  m/log(N/s),  we  can  find  a  constant  0  <  C3 
such  that  the  probability  of  a  failure  of  RIP  becomes  smaller 
than  exp(— C3?n)  provided  that  C3  <  2m(Ss/2  +  e)2/(wu  — 
we)2  —  s[log(eN / s)  +  log(12 / S s)].  This  guaranteed  exponen¬ 
tially  small  chance  of  RIP  failure  is  the  key  to  the  logarithmic 
scaling  of  the  resources  in  CQPT.  If  RIP  is  satisfied  the  l\ 
norm  minimization  algorithm  works  to  find  a  sparse  solution. 
Here  we  proved  that  by  increasing  the  number  of  configura¬ 
tions  to  would  exponentially  decrease  the  chance  of  RIP  fail¬ 
ure.  This  completes  the  connection  between  the  concentration 
measure  (B6)  and  the  restricted  isometry  property. 


APPENDIX  C:  PERFORMANCE  OF  THE  ALGORITHM 


In  Ref.  [2],  the  accuracy  of  the  £3 -norm  minimization  problem 
is  given  by  (C2).  The  parameters  C\  and  C2  are  explicitly 
given  in  terms  of  the  isometry  constant  5S: 


2  +  {2^2  -  2)Ss 
1-  (V2  +  1  )Ss  ’ 


In  the  formulation  of  CQPT  a  random  selection  of  the  expec¬ 
tation  values  ijm,  .fi,  are  not  available  in  our  experiment.  Due 
to  photon  loss  the  detector  counts  are  not  conclusive,  hence,  a 
complete  set  of  counts  corresponding  to  a  complete  set  of  ob¬ 
servables  is  required  to  produce  meaningful  expectation  val¬ 
ues  yMi,pi  ■  A  solution  to  this  problem  is  to  limit  the  measure¬ 
ments  to  few-body  observables.  For  fc-body  measurements 
a  total  number  of  2k  complementary  observables  need  to  be 
measured.  Since  to,  the  number  of  measurements,  is  expo¬ 
nentially  small  we  can  choose  k  limited  to  few-body  opera¬ 
tors,  k  =  kmax,  and  even  single -body  as  we  did  in  the  exper¬ 
iment.  For  a  fully  random  selection  of  observables,  the  total 
number  of  measurements  to  will  be  increased  by  a  constant 
factor  2klax .  Still  this  number  is  exponentially  small.  This  re¬ 
dundancy,  however,  can  be  avoided  by  using  the  outcomes  of 
all  2k  observables.  This  selection  scheme  is  not  fully  random, 
rather  it  is  a  deterministic-random  way  of  choosing  observ¬ 
ables. 

As  discussed  in  the  paper,  random  selections  of  probabili¬ 
ties  from  the  full  data  set,  although  exhibiting  results  which 
are  entirely  consistent  with  compressive  sensing  theory,  are 
inconsistent  with  how  data  is  actually  collected  in  this  kind 
of  standard  photonic  experiment.  In  practice  we  are  limited 
to  measure  few-body  observables.  For  low  to,  the  configura¬ 
tions  must  allow  for  normalisation,  i.e.  we  are  restricted  in 
how  random  our  low-number  selections  can  be.  A  selection 
of  some  of  these  permissible  configurations  are  shown  in  Ta¬ 
ble  I.  Here  we  see  some  of  the  remarkable  results  promised  by 
the  theory  of  compressed  sensing,  e.g.,  a  98%  fidelity  from  32 
configurations  and  a  94%  fidelity  from  only  18  configurations. 

Another  issue  to  consider  is  experimental  precision.  The 
expectation  values  of  fc-body  observables  of  random  states  re¬ 
duce  for  a  larger  fc.  This  implies  the  need  for  a  larger  number 
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Inputs 

Observables 

m 

F{X5 76,  Xm) 

T(Ucz,X™) 

HVDRx2 

HVDARLx2 

576 

1 

0.88 

HVDRx2 

{RI,IR} 

32 

0.98 

0.89 

{DI,ID} 

0.97 

0.87 

HVDRx2 

RLx2 

64 

0.95 

0.86 

DAxDA 

0.95 

0.86 

VDRx2 

{RI,IR} 

18 

0.94 

0.86 

{DI.ID} 

0.93 

0.88 

VDRx2 

RLx2 

36 

0.94 

0.87 

DAx2 

0.94 

0.84 

TABLE  I:  Fidelity  assessment  of  some  selected  configurations  that 
are  available  in  our  experiment. 

of  statistical  samples.  Fortunately,  this  issue  is  not  a  problem 
for  our  scheme  since  we  can  take  k  as  small  as  we  want,  as 
discussed  above. 


APPENDIX  F:  CLASSICAL  POSTPROCESSING 

The  estimation  results  computed  from  the  experimental  data 
were  all  obtained  by  solving  equation  2  in  the  main  text  by 
using  “off-the-shelf”  MATLAB  based  software.  Specifically, 
we  used  YALMIP  to  call  the  convex  solver  SDPT3  [7,  8].  On  a 
standard  desktop  it  takes  about  2  sec  of  CPU-time  to  solve  (2) 
for  the  full  576  configuration  set.  This  software  can  handle  3- 
qubit  systems  but  it  is  more  advisable  to  migrate  to  more  spe¬ 
cialized  software  where  orders  of  magnitude  speed  increases 
are  possible,  e.g.,  [3]. 


APPENDIX  G:  EXPERIMENTAL  DETAILS 

The  quantum  gate  used  in  the  experiment  is  a  photonic 
controlled-phase  gate.  Fig.  1  [9].  It  is  based  on  a  single  par¬ 
tially  polarising  beam  splitter  (PPBS),  having  different  reflec¬ 
tivity,  r]v=\,  r///=0,  for  the  horizontal  and  the  vertical  polar¬ 
isation  of  input  photons.  Due  to  two-photon  interference,  the 
input  state  |  VV)  undergoes  a  n  phase  shift  |  VV)  — >  —  |  VV) 
whenever  the  two  photons  leave  the  PPBS  through  different 
output  ports.  Correct  operation  of  the  gate  is  signalled  by  a 
coincidence  detection  in  these  output  modes;  the  gate  is  thus 
probabilistic,  with  a  success  probability  of  1/9. 

The  gate  acts  on  photonic  qubits  created  via  spontaneous 
parametric  downconversion  (SPDC).  Downconversion  is  in¬ 
trinsically  a  random  process:  consequently  the  created  states 
contain  small  amounts  of  higher-order  emission — e.g.  j  22) 
as  well  as  the  desired  1 1 1) — which  appear  as  decoherence  in 
a  quantum  process  [10,  11].  The  ratio  of  higher  order  terms 
to  the  desired  photon  pair  number  increases  with  the  pair  cre¬ 
ation  probability,  which  in  turn  is  proportional  to  the  pump 
laser  power.  Once  can  therefore — to  some  extent — control  the 
decoherence  in  a  process  via  the  laser  power. 

In  order  to  cover  a  comprehensive  range  of  decoherence,  we 
performed  six  experiments  with  2-photon  states  directly  cre¬ 


beam-splitter  (reflectance  r|=1/3)  PBS 


FIG.  1 :  Detailed  representation  of  the  CZ-gate  in  dual  rail  encoding. 
Each  qubit  is  represented  by  two  paths,  one  for  each  logical  basis 
state,  |0)  =  \H)  and  |1)  =  |V)  [9]. 


Purity  (x576) 


FIG.  2:  Fidelities  vs.  purities  for  m  =  32  corresponding  to  the 
configurations  in  Table  I. 


ated  via  a  single  SPDC  emission,  and  one  experiment  with 
4-photon  states  created  in  two  independent  SPDC  sources, 
where  one  photon  of  each  SPDC  process  was  used  as  a  trigger. 
The  latter  experiment  is  more  representative  of  large-scale 
systems,  where  independent  photon  sources  will  be  required. 
It  has  significantly  reduced  count  rates,  and  reduced  two- 
photon  interference  between  photons  in  the  quantum  gate  due 
to  both  the  pump-induced  decoherence  and  group-velocity 
mismatch  [11],  reflected  in  the  low  purity  of  the  process  in 
this  case  of  0.62. 

Typical  count  rates  for  2-photon  experiments  are  2000  co¬ 
incident  counts  per  second,  full  QPT,  building  up  reasonable 
statistics,  takes  about  2.5  hours;  in  contrast,  4-photon  experi¬ 
ments  have  much  lower  rates,  1  four-fold  coincidence  per  sec¬ 
ond,  and  take  2  days.  The  32-configuration  CQPT  reduces 
tomography  times  to  8  minutes  and  2.6  hours  respectively:  a 
clear  advantage. 

Fig.  2  shows  the  effect  of  varying  laser  pump  power  on 
CQPT  estimation  accuracy  for  one  of  the  single-observable 
configurations  from  Table  I.  Specifically  for  the  32  configura¬ 
tions  arising  from  all  combinations  of  the  16  inputs  HVDRx- 
HVDR  and  2  outcomes  {RI.IR}.  As  pump  power  increases, 
the  process  purity,  as  measured  by  Tr(x|76)/16  decreases; 
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effectively  the  signal  to  noise  ratio  deteriorates.  As  might  be 
expected,  the  worst-case  fidelity  decreases  with  process  pu¬ 
rity.  The  estimated  channel  fidelity  is  however  remarkably 
robust,  staying  very  close  to  the  actual  channel  fidelity. 


APPENDIX  H:  SIMULATION  RESULTS 

QPT  is  performed  by  solving  Eqn.  (2)  in  the  paper  with 
noise-free  experiments  (e  =  0)  for  a  system  designed  to  be 
a  2-qubit  quantum  Fourier  transform  (QFT)  with  unitary  rep¬ 
resentation  Uqft  £  C4x4,  which  interacts  with  an  unknown 
environment  via  the  total  constant  (time-independent)  Hamil¬ 
tonian,  H=Ie  0  //qft  +7  / 1  with  H  randomly  selected  and 
normalized  to  ||fT||=l;  7  is  thus  the  interaction  magnitude. 
The  simulated  system  Xsim  £  C16x16  is  extracted  via  the 
partial  trace  over  the  environment  for  7  £  {0.5, 1.0, 1.25}. 
Each  of  these  induces  a  fidelity  with  respect  to  the  ideal  uni¬ 
tary,  T ( U qft ,Xsim )  £  {0.70,0.80,0.95}  The  estimates  from 
Eqn.  (2)  in  the  paper  are  obtained  in  the  singular  value  de¬ 
composition  (SVD)-basis  in  Eqn.(l)  of  the  paper  of  the 
ideal  QFT.  The  process  matrix  of  the  ideal  unitary  in  this  ba¬ 
sis  is  maximally  sparse  with  the  single  non-zero  1,1-element 
equal  ton  =  4  [  1  ] .  The  environmental  interactions  make  the 
process  matrix  nearly  sparse,  i.e.,  compressible. 

To  form  the  measurement  matrix  <l>  £  <rmx  256^  we  ran_ 
domly  generated  4  and  16  input  pairs  |-0i)  0  \1jj2)  and  2,  4, 
and  6  random  selections  from  the  single-body  Pauli  observ¬ 
ables  { IX ,  IY,  IZ ,  XI,  YI,  ZI}.  This  gives  6  configurations 
with  m  £  {8, 16,  24,  32, 64, 96},  for  which  u/£  «  1.3  ensur¬ 
ing  6  ~  0.13.  Fig.  3  shows  the  fidelities  T{xm,X sim)  of  the 
reconstructed  estimates  Xm  and  the  simulated  process  matri¬ 
ces  Xsim  for  all  18  combinations  of  m  and  interaction  magni¬ 
tudes  7. 

These  results  arise  from  the  relative  sparsity  of  the  process 
matrix  in  the  S  VD-basis  of  the  ideal  QFT.  Fig.  4  shows  3D  bar 
plots  of  the  real  and  imaginary  elements  of  the  true  and  esti¬ 
mated  process  matrices  for  m  =  64,  TlJJqft,  Xsim)  =  0.70, 
and  ^r(X64i  Xsim)  =  0.93.  In  the  SVD-basis  (row  2),  the  true 
process  matrix  exhibits  the  expected  large  1,1 -element  with 
the  remaining  elements  much  smaller  by  comparison.  The  es¬ 
timated  channel  fidelity  is  0.7 1 . 

In  Fig.  3,  J~  (xm >  Xsim)  (white  bars)  trends  to  increase  with 
to,  more  so  for  T  =  0.7  than  for  T  =  0.95,  and  rises  a  bit 
sharply  at  different  to  values.  Just  as  for  the  experimental 
results,  this  can  be  connected  to  the  actual  sparsity  of  the  sim¬ 
ulated  process  matrices.  Figure  5,  just  like  Fig. 4  in  the  main 
text,  shows  the  absolute  sorted  process  matrix  elements  rela¬ 
tive  to  the  1,1 -element.  Where  each  plot  crosses  the  threshold 
of  0.02,  we  see  that  the  number  of  elements  above  this  value 
increases  with  decreasing  decoherence  7.  If  these  are  taken 
as  the  s-sparse  approximation  levels  indicated  in  the  theory, 
Eqn.  (4)  in  the  paper,  then  (approximately)  s  £  {30,  50, 100} 
correspond  to  lF(Uqft,  Xsim)  £  {0.95,0.80.0.70}.  This  corre¬ 
lates  well  with  how  T(xm,  Xsim)  varies  with  resources  to. 


FIG.  3:  Fidelities  vs.  configurations  for  each  process  matrix  estimate 
Xm  from  Eqn.  (2)  in  the  paper  in  the  SVD  basis  of  the  ideal  QFT  uni¬ 
tary.  Black  bars:  simulated  compared  to  ideal  process  ^(Uqft,  Xsim). 
Gray  bars:  estimate  compared  to  ideal  lF(Uqft,Xm)-  White  bars: 
estimate  compared  to  simulated  process  lF(xm,  Xsim). 


FIG.  4:  Real  and  imaginary  x  elements  for  m=64,  T(JJ qft,  Xsim)  = 
0.71.  7=1.25.  Row  1:  True  process  matrix  in  the  natural  ba¬ 
sis.  Row  2:  True  process  matrix  in  SVD-basis  of  ideal  unitary. 
Row  3:  Estimated  process  matrix  projected  to  the  natural  basis  , 
T(XJq  f,,Xm)=0.71. 
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APPENDIX  I:  BEYOND  2-QUBITS 


FIG.  5:  Absolute  values  of  the  256  process  matrix  elements  of 
Xsvd*3  £  C16x16  sorted  by  relative  magnitude  (with  respect  to  the 
11-element)  for  _F(I/qft,  Strue)  £  {0.95,  0.80,  0.70}. 


Absolute  values  of  x  in  SVD  basis 


■  d=21 2,  p=240,  F  =  0.84 

•  d=23,  p=4032,  F  =  0.83 

d=24 5 6,  p=65,280,  F  =  0.85 

10' 


Sorted  index 


FIG.  6:  Absolute  values  of  the  process  matrix  elements  sorted  by 
relative  magnitude  (with  respect  to  the  11-element)  all  in  the  ideal 
SVD  basis  (in  this  case  for  an  identity  operator  on  the  system)  for 
three  cases:  blue  Xsim  £  C16x16  with  jF(/4,Xsim)  =  0.84;  red 
Xsim  £  c64x64  with  JF(/s,  Xsim)  =  0.83;  green  Xsim  £  C256x256 
with  JF(/ie,  Xsim)  =  0.85; 


Standard  QPT  scales  exponentially,  thus  for  3  and  4  qubits 
the  number  of  required  experimental  configurations  is,  respec¬ 
tively  4,032  and  65,280.  As  we  have  shown  theoretically,  ex- 
perimentaly,  and  lastly  via  the  previous  simulations,  CQPT 
shows  quite  a  different  scaling.  Fig.  6  shows  the  absolute 
values,  sorted  by  relative  magnitude,  of  the  process  matrices 
arising  from  a  random  selection  of  a  perturbed  system  near 
identity,  i.  e. ,  a  quantum  memory,  corresponding  to  similar  fi¬ 
delities.  The  process  matrices  elements  are  shown  in  a  ba¬ 
sis  corresponding  to  the  ideal  identity.  Again  taking  0.01  as 
a  threshold  we  see  that  for  2-qubits  we  get  m  ~  30  which 
is  similar  to  our  experimental  results  and  those  supported  by 
the  plots  in  Figures  4  in  the  main  text  and  here  in  5.  Fig.  6 
predicts  for  3-qubits  m  «  100,  and  for  4-qubits  m  ss  300. 
These  simulation  results  show  first  that  the  process  matrices 
are  compressible,  and  in  addition  are  consistent  with  the  ex¬ 
perimental  results  in  Fig.  4  in  the  main  text.  To  actually  per¬ 
form  the  estimaton,  that  is  solve  Eqn.  (2)  in  the  paper,  as  pre¬ 
viously  mentioned,  requires  specialized  compressed  sensing 
algorithms  optimized  for  speed  and  efficiency,  e.g.,  [3]. 
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We  develop  an  efficient  and  robust  approach  for  quantum  measurement  of  nearly-sparse 
many-body  quantum  Hamiltonians  based  on  the  method  of  compressive  sensing.  This 
work  demonstrates  that  with  only  0(s  log(d))  experimental  configurations,  consisting  of 
random  local  preparations  and  measurements,  one  can  estimate  the  Hamiltonian  of  a  d- 
dimensional  system,  provided  that  the  Hamiltonian  is  nearly  s-sparse  in  a  known  basis. 
The  classical  post-processing  is  a  convex  optimization  problem  on  the  total  Hilbert  space 
which  is  generally  not  scalable.  We  numerically  simulate  the  performance  of  this  algo¬ 
rithm  for  three-  and  four-body  interactions  in  spin-coupled  quantum  dots  and  atoms  in 
optical  lattices.  Furthermore,  we  apply  the  algorithm  to  characterize  Hamiltonian  fine 
structure  and  unknown  system-bath  interactions. 
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We  develop  an  efficient  and  robust  approach  to  Hamiltonian  identification  for  multipartite  quantum  systems 
based  on  the  method  of  compressed  sensing.  This  work  demonstrates  that  with  only  0(s  log(d))  experimental 
configurations,  consisting  of  random  local  preparations  and  measurements,  one  can  estimate  the  Hamiltonian 
of  a  d-dimensional  system,  provided  that  the  Hamiltonian  is  nearly  s-sparse  in  a  known  basis.  We  numerically 
simulate  the  performance  of  this  algorithm  for  three-  and  four-body  interactions  in  spin-coupled  quantum  dots 
and  atoms  in  optical  lattices.  Furthermore,  we  apply  the  algorithm  to  characterize  Hamiltonian  fine  structure 
and  unknown  system-bath  interactions. 

PACS  numbers: 


I.  INTRODUCTION 

The  dynamical  behavior  of  multipartite  quantum  systems  is 
governed  by  the  interactions  amongst  the  constituent  particles. 
Although,  the  physical  or  engineering  considerations  may 
specify  some  generic  properties  about  the  nature  of  quantum 
dynamics,  the  specific  form  and  the  strength  of  multi-particle 
interactions  are  typically  unknown.  Additionally,  quantum 
systems  usually  have  an  unspecified  interaction  with  their 
surrounding  environment.  In  principle,  one  can  characterize 
quantum  dynamical  systems  via  “quantum  process  tomogra¬ 
phy”  (QPT)  [1-8].  However,  the  relationship  between  rele¬ 
vant  physical  properties  of  a  system  to  the  information  gath¬ 
ered  via  QPT  is  typically  unknown.  Alternatively,  knowledge 
about  the  nature  of  inter-  and  intra-  many-body  interactions 
within  the  system  and/or  its  environment  can  be  constructed 
by  identifying  a  set  of  (physical  or  effective)  Hamiltonian  pa¬ 
rameters  generating  the  dynamics  [9-18].  Currently,  a  scal¬ 
able  approach  for  efficient  estimation  of  a  full  set  of  Hamilto¬ 
nian  parameters  does  not  exist. 

The  dynamics  of  a  quantum  system  can  be  estimated  by 
observing  the  evolution  of  some  suitable  test  states.  This  can 
be  achieved  by  a  complete  set  of  experimental  configurations 
consisting  of  appropriate  input  states  and  observables  mea¬ 
sured  at  given  time  intervals.  Knowledge  about  the  dynam¬ 
ics  may  then  be  reconstructed  via  inversion  of  the  laboratory 
data  by  fitting  a  set  of  dynamical  variables  to  the  desired  accu¬ 
racy.  Estimating  Hamiltonian  parameters  from  such  a  proce¬ 
dure  faces  three  major  problems:  (1)  The  number  of  required 
physical  resources  grows  exponentially  with  the  degrees  of 
freedom  of  the  system  [1-8],  (2)  There  are  inevitable  statis¬ 
tical  errors  associated  with  the  inversion  of  experimental  data 
[1-8],  (3)  The  inversion  generally  involves  solving  a  set  of 
nonlinear  and  non-convex  equations,  since  the  propagator  is 
a  nonlinear  function  of  Hamiltonian  parameters  [9-18],  The 
first  two  problems  are  always  present  with  any  form  of  quan¬ 
tum  tomography,  but  the  last  problem  is  specific  to  the  task 
of  Hamiltonian  identification  as  we  wish  to  reconstruct  the 
generators  of  the  dynamics.  Many  quantum  systems  involve 
two-body  local  interactions,  so  the  goal  is  often  to  estimate 
sparse  Hamiltonians  with  effectively  a  polynomial  number  of 


unknown  parameters.  Unfortunately,  quantum  state  and  pro¬ 
cess  tomography  cannot  readily  exploit  this  potentially  useful 
feature. 

The  highly  nonlinear  feature  in  the  required  inversion  of 
laboratory  data  was  studied  in  Ref.  [9]  in  which  closed-loop 
learning  control  strategies  were  used  for  the  Hamiltonian 
identification.  In  that  approach  one  estimates  the  unknown 
Hamiltonian  parameters  by  tailoring  shaped  laser  pulses  to 
enhance  the  quality  of  the  inversion.  Identification  of  time- 
independent  (or  piece-wise  constant)  Hamiltonians  have  been 
studied  for  single-qubit  and  two-qubit  cases  [13,  14]  to  verify 
the  performance  of  quantum  gates.  Estimation  of  these  Hamil¬ 
tonians  is  typically  achieved  via  monitoring  the  expectation 
values  of  some  observable,  e.g.  concurrence,  which  are  time 
periodic  functions.  Through  Fourier  transform  of  this  signal 
the  identification  task  is  reduced  to  finding  the  relative  loca¬ 
tion  of  the  peaks  and  heights  of  the  Fourier  spectrum  [13,  14]. 
Bayesian  analysis  is  another  method  proposed  for  robust  es¬ 
timation  of  a  two-qubit  Hamiltonian  [15].  The  difficulty  with 
these  methods  is  then  scalability  with  the  size  of  the  system. 
A  symmetrization  method  for  efficient  estimation  of  the  mag¬ 
nitude  of  effective  two-body  error  generators  in  a  quantum 
computer  was  studied  in  [16]  by  monitoring  quantum  gate  av¬ 
erage  fidelity  decay.  Recently,  it  was  demonstrated  that  direct 
or  selective  QPT  schemes  could  be  used  for  efficient  identifi¬ 
cation  of  short-time  behavior  of  sparse  Hamiltonians  [17]  as¬ 
suming  controllable  two-body  quantum  correlations  with  aux¬ 
iliary  systems  and  the  exact  knowledge  of  the  sparsity  pattern. 
Another  scheme  for  the  determination  of  the  coupling  param¬ 
eters  in  a  chain  of  interacting  spins  with  restricted  controlla¬ 
bility  was  introduced  in  Ref.  [18]. 

In  this  work,  inspired  by  recent  advances  in  classical  signal 
processing  known  as  compressed  sensing  [19],  we  use  random 
local  input  states  and  measurement  observables  for  efficient 
Hamiltonian  identification.  We  show  how  the  difficulties  with 
the  nonlinearity  of  the  equations  can  be  avoided  by  either  a 
short  time  or  a  perturbative  treatment  of  the  dynamics.  We 
demonstrate  that  randomization  of  the  measurement  observ¬ 
ables  enables  compressing  the  extracted  Hamiltonian  infor¬ 
mation  into  a  exponentially  smaller  set  of  outcomes.  This  is 
accomplished  by  a  generalization  of  compressed  sensing  to 
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utilize  random  matrices  with  correlated  elements.  This  ap¬ 
proach  is  applicable  for  Hamiltonians  that  are  nearly  sparse 
in  a  known  basis  with  an  arbitrary  unknown  sparsity  pattern 
of  parameters.  The  laboratory  data  can  then  be  inverted  by 
solving  a  convex  optimization  problem.  This  algorithm  is 
highly  tolerant  to  noise  and  experimental  imperfections.  The 
power  of  this  procedure  is  illustrated  by  simulating  three-  and 
four-body  Hamiltonians  for  neutral  atoms  in  an  optical  lattice 
and  spin-coupled  quantum  dot  systems,  respectively.  Further¬ 
more,  we  directly  apply  the  algorithm  to  estimate  Hamiltonian 
fine  structure  and  characterize  unknown  system-bath  interac¬ 
tions  for  open  quantum  systems. 


II.  QUANTUM  DYNAMICAL  EQUATIONS 

The  time  evolution  of  a  quantum  system  in  a  pure  state 
is  governed  by  the  Shrodinger  equation,  d\ip(t))/dt  = 
—iH\ip(t)).  The  solution  of  this  equation  for  a  time- 
independent  Hamiltonian  can  be  simply  expressed  as  |  ip(t))  = 
exp(— itH)  |^(0)>.  In  principle,  the  Hamiltonian  of  the  sys¬ 
tem  H  can  be  estimated  by  preparing  an  appropriate  set  of  test 
states  (IV'fe)}  and  measuring  the  expectation  value  of  a  set  of 
observables  {Mj}  after  the  system  has  evolved  for  a  certain 
period  of  time.  The  expectation  value  of  these  observables  can 
be  expressed  as 

Pjk  =  (Mj)^k  =  (ipk\  eitHMje-itH  IV’fc)  (1) 

Equation  f  1)  implies  that  the  experimental  outcomes  {pjk} 
are  nonlinear  functions  of  the  Hamiltonian  parameters.  To 
avoid  the  difficulties  of  solving  a  set  of  coupled  nonlinear 
equations  we  consider  the  short  time  behavior  of  the  system. 
Monitoring  the  short  time  dynamics  of  the  system  is  valid 
when  the  relevant  time  scales  of  the  system  evolution  sat¬ 
isfy  t  <C  K  ~ 1  where,  for  positive  operator-valued  measure 
(POVM)  operators  {Mj},  the  constant  K  equals  2||fT||spec. 
The  general  expression  of  K  is  given  in  appendix  B,  also  see 
appendix  A  for  definition  of  the  norms.  This  yields  the  lin¬ 
earized  form  of  the  Eq.  (1) 

Pjk  =  (ipk\ M.j  | ipk)  +  it  (ipk\  { H ,  Mj]  | ipk)  +  0(K2t2)  (2) 

The  linear  approximation  contains  enough  information  to 
fully  identify  the  Hamiltonian  and  the  higher  order  terms  do 
not  provide  additional  information.  The  short-time  approxi¬ 
mation  implies  prior  knowledge  about  the  system  dynamical 
time-scale  or  the  order  of  magnitude  of  ||Tf||Spec-  This  prior 
knowledge  can  be  available  from  generic  physical  and  engi¬ 
neering  considerations.  For  example,  in  solid-state  quantum 
devices  the  time-scale  of  single  qubit  rotations  is  typically  on 
the  order  of  1-10  ns.  The  switching  time  for  exchange  inter¬ 
actions  varies  among  different  solid-state  systems  from  lps  to 
lOOps,  (for  more  details  see  appendix  B.) 

We  expand  the  Hamiltonian  in  an  orthonormal  basis  { 
where  TrfT^r^)  =  dSa H  =  Ya  hcYa  .  Here  d  is  the  di¬ 
mension  of  the  Hilbert  space.  In  this  representation  the  Hamil¬ 
tonian  parameters  are  the  coefficients  ha.  The  expanded  form 


of  the  above  affine  equation  (2)  is 

Pjk  —  it  E  ('tPk\[ra,Mj}\i>k)ha  (3) 

O' 

Here  we  introduce  the  experimental  outcomes  as  pjk  =  Pjk  — 
(ipk I  Mj  | ipk),  since  (ipk\  Mj  \ ipk)  is  a  priori  known.  The  re¬ 
lation  (3)  corresponds  to  a  single  experimental  configuration 
(Mj,\ipk)).  For  a  d-dimensional  system,  the  total  number  of 
Hamiltonian  parameters  ha  is  d2.  Thus,  one  requires  the  same 
number  of  experimental  outcomes,  pjk  that  leads  to  d2  lin¬ 
early  independent  equations.  For  a  system  of  n  qubits,  this 
number  grows  exponentially  with  n  as  d  =  22n.  In  order 
to  devise  an  efficient  measurement  strategy  we  will  focus  on 
physically  motivated  nearly  sparse  Hamiltonians. 

A  Hamiltonian  H  is  considered  to  be  s-sparse  if  it  only  con¬ 
tains  s  non-zero  parameters  {ha}.  More  generally,  a  Hamilto¬ 
nian  H  is  termed  nearly  s-sparse,  for  a  threshold  77,  if  at  most  s 
coefficients  ha  ( H  =  Y  haTa)  have  magnitude  greater  than 
phmax  where  hmax  =  ma x(h.a).  By  definition,  the  sparsity 
is  basis  dependent.  However,  for  local  interactions,  the  basis 
in  which  the  Hamiltonian  is  sparse  is  typically  known  from 
physical  or  engineering  considerations. 

III.  COMPRESSED  HAMILTONIAN  ESTIMATION 

Our  algorithm  is  based  on  general  methods  of  so-called 
compressed  sensing  that  recently  have  been  developed  in  sig¬ 
nal  processing  theory  [19],  Compressed  sensing  allows  for 
condensing  signals  and  images  into  a  significantly  smaller 
amount  of  data,  and  recovery  of  the  signal  becomes  possi¬ 
ble  from  far  fewer  measurements  than  required  by  traditional 
methods. 

Compressed  sensing  has  two  main  steps:  encoding  and  de¬ 
coding.  The  information  contained  in  the  signal  is  mapped 
into  a  set  of  laboratory  data  with  an  exponentially  smaller  rep¬ 
resentation.  This  compression  can  be  achieved  by  randomiza¬ 
tion  of  data  acquisition.  The  actual  signal  can  be  recovered 
via  an  efficient  algorithm  based  on  convex  optimization  meth¬ 
ods.  Compressed  sensing  has  been  applied  to  certain  quantum 
tomography  tasks.  Standard  compressed  sensing  has  been  di¬ 
rectly  used  for  efficient  pseudothermal  ghost  imaging  [20, 21]. 
Recently,  a  quadratic  reduction  in  the  total  number  of  mea¬ 
surements  for  quantum  tomography  of  a  low  rank  density  ma¬ 
trix  has  been  demonstrated  using  a  compressed  sensing  ap¬ 
proach  [22], 

Here,  we  first  describe  how  the  Hamiltonian  information  is 
compressed  into  the  experimental  data.  The  output  of  a  single 
measurement  is  related  to  the  unknown  signal  (Hamiltonian 
parameters)  through  the  relation  (3).  Suppose  we  try  to  dif¬ 
ferent  experimental  configurations  (i.e.,  to  different  pairs  of 
(Mj,  | ipk)))-  This  yields  a  set  of  linear  equations 

i?  =  d>  h  (4) 

where  $  is  a  m  x  d2  matrix  with  elements  ^jk,a  = 
it/y/m(ipk  |  [Ta,Mj]  | ipk)  (A  factor  1  /sjrn  is  included  for 
simplifying  the  proofs,  appendix  C).  In  general  to  has  to  be 
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greater  than  or  equal  to  d2  in  order  to  solve  Eq.  (4).  A  Hamil¬ 
tonian  estimation  attempt  with  m  <  d2  seems  impossible  as 
we  face  an  underdetermined  system  of  linear  equations  with 
an  infinite  number  of  solutions.  However,  any  two  s-sparse 
Hamiltonians  hi  and  /12  still  can  be  distinguished  via  a  prop¬ 
erly  designed  experimental  setting,  if  the  measurement  matrix 
$  preserves  the  distance  between  hi  and  /12  to  a  good  approx¬ 
imation: 

(i-ss)\\h2-hi\\i  <  11^2-^1)11^  <  (i+<yii/i2-Ml/2 

(5) 

for  a  constant  Ss  £  (0, 1).  A  smaller  Ss  ensures  higher  distin- 
guishability  of  s-sparse  Hamiltonians.  The  inequality  relation 
(5)  is  termed  a  restricted  isometry  property  (RIP)  of  the  matrix 
$  [23].  We  now  discuss  how  to  construct  a  map  $  satisfying 
this  inequality,  and  how  small  the  value  of  m  can  be  made. 

The  RIP  (5)  for  a  matrix  $  can  be  established  by  employ¬ 
ing  the  measure  concentration  properties  of  random  matrices. 
In  each  experiment  the  test  state  and  the  measurement  ob¬ 
servable  can  be  drawn  randomly  from  a  set  of  configurations 
{ Mj ,  \ipk)}  realizable  in  the  laboratory.  The  independent  se¬ 
lection  of  ?/)/.)  and  Mj  leads  to  a  matrix  $  with  independent 
rows  but  correlated  elements  <f fk,a  in  each  row.  Thus  the  stan¬ 
dard  results  from  compressed  sensing  theory  are  not  applica¬ 
ble  here  (appendix  C). 

In  contrast,  here  we  derive  a  concentration  inequality  for  a 
matrix  with  independent  rows  and  correlated  columns  as  the 
backbone  for  the  RIP  of  our  quantum  problem  in  appendix  C. 
Using  Hoeffding’s  inequality,  we  show  that  for  any  Hamilto¬ 
nian  h  and  a  random  matrix  $  with  column  only  correlations, 
the  random  variable  ||<f>/i||2  is  concentrated  around  ||/i||2  with 
a  high  probability,  i.e.  V  0  <  S  <  1 

Prob.{|||M||22  - \\h\\l\  >  S\\h\\22}  <  2e— oU+cO2  (6) 
for  some  constants  cq  and  ci . 

Using  the  above  inequality,  now  we  can  show  how  an  ex¬ 
ponential  reduction  in  the  minimum  number  of  the  required 
configurations  can  be  achieved  for  Hamiltonian  estimation. 
The  inequality  (6)  is  defined  for  any  h  while  the  inequal¬ 
ity  in  the  definition  of  RIP,  Eq.(5),  is  for  any  s-sparse  h. 
As  shown  in  Ref.  [24],  there  is  an  inherent  connection 
between  these  two  inequalities.  It  is  proved  that  any  ma¬ 
trix  $  satisfying  (6)  has  RIP  with  probability  greater  than 
1  -  2  exp (— mc0(5|  +ci)2  +  s[log(d4/s)  +  log(12e/<5|)]). 
In  addition,  whenever  m  >  c-^s  log(c/4 / s) ,  for  a  sufficiently 
large  constant  C2  one  can  find  a  constant  C3  >  0  such  that  the 
likelihood  of  the  RIP  to  be  satisfied  converges  exponentially 
fast  to  unity  as  1  —  2  exp(— C3TO.). 

The  set  of  experimental  configurations  defined  by  Eq  (4), 
and  the  concentration  properties  given  by  Eq  (5)  and  (6)  can 
be  understood  as  encoding  the  information  of  a  sparse  Hamil¬ 
tonian  into  a  space  with  a  lower  dimension.  Next  we  need 
to  provide  an  efficient  method  for  decoding  in  order  to  re¬ 
cover  the  original  Hamiltonian.  The  decoder  is  simply  the 
minimizer  of  the  (1  norm  of  the  signal  h.  Implementing  this 
decoder  is  a  special  convex  optimization  problem,  which  can 
be  solved  via  fast  classical  algorithms,  yet  not  stricktly  scal¬ 
able.  Furthermore,  the  encoding/decoding  scheme  is  robust  to 


noisy  data  as  |  \p'  —  <f>/i| \i2  <  e  where  e  is  the  noise  threshold. 
Note  that  e  includes  the  error  of  linearization  (see  Eq.(2))  that 
is  0(i/rn.Kt2).  Denote  ho  as  the  true  representation  of  the 
Hamiltonian.  For  a  threshold  //,  ho(s)  is  an  approximation  to 
ho  obtained  by  selecting  the  s  elements  of  ho  as  those  that  are 
larger  than  t]hmax  and  setting  the  remaining  elements  to  zero. 
Now  we  state  our  main  result: 


IV.  ALGORITHM  EFFICIENCY 

,4 

If  the  measurement  matrix  $  G  <Cmxd  is  drawn  randomly 
from  a  probability  distribution  that  satisfies  the  concentration 
inequality  in  (5)  with  Ss  <  s/2  —  1,  then  there  exist  constants 
C2,C3,di,d2  >  0  such  that  the  solution  h*  to  the  convex  opti¬ 
mization  problem, 

minimize  ||ft.||q 

subject  to  || p1  —  H>/i|  |^2  <  e,  (7) 

satisfies, 

II h*  -  /loll, 2  <  -  *0 1 Ui  +  d2e  (8) 

with  probability  >  1  —  2e_m°3  provided  that, 

to  >  c2slog(c/4/s),  (9) 

where  the  performance  of  a  li  minimizer,  Eq.  (8),  and  the 
necessary  bound  5  s  <  s/2  —  1  are  derived  by  Candes  in  Ref. 
[25], 

As  an  example,  for  a  system  consisting  of  n  interacting 
qubits,  the  exponential  number  of  parameters  describing  the 
dynamics,  22n,  can  be  estimated  with  a  linearly  growing  num¬ 
ber  of  experiments  to  >  C2s(81og(2)n  —  log(s)).  The  sec¬ 
ond  term,  c^e,  indicates  that  the  algorithmic  performance  is 
bounded  by  the  experimental  uncertainties.  Consequently,  for 
fully  sparse  Hamiltonians  and  e  =  0  the  exact  identification 
of  an  unknown  Hamiltonian  is  achievable.  The  properties  of 
the  ensemble  from  which  the  states  and  measurement  observ¬ 
ables  are  chosen  would  determine  the  parameter  Ss  and  con¬ 
sequently  the  performance  of  the  algorithm.  The  linear  inde¬ 
pendency  of  the  $  matrix  rows  for  a  random  set  of  local  state 
preparations  and  observables  can  be  guaranteed  by  a  polyno¬ 
mial  level  of  computational  overhead  before  conducting  the 
experiments. 

A  certification  for  the  nearly  sparsity  assumption  can  be  ob¬ 
tained  fromEqs.(8)  and  (9)  as  follows:  Suppose  h*m  is  the  al¬ 
gorithm’s  outcome  for  to  configurations.  The  nearly  sparsity 
assumption  is  certified  on  the  fly  during  the  experiment,  if  the 
estimation  improvement  ||/i^+1  —  h* J|  converges  to  zero  for 
a  polynomially  large  total  number  of  configurations. 

V.  PHYSICALLY  NEARLY  SPARSE  HAMILTONIAN 

Although  physical  systems  at  the  fundamental  level  involve 
local  two-body  interactions,  many-body  Hamiltonians  often 
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describe  quantum  dynamics  in  a  particular  representation  or 
in  well  defined  approximate  limits.  The  strength  of  the  non¬ 
local  A;- body  terms  typically  is  much  smaller  than  the  two- 
body  terms  with  strength  J  and  decreases  with  the  number  k. 
For  a  fixed  sparsity  threshold  ij,  kv  is  defined  as  the  largest 
number  k  for  which  fc-body  terms  have  strength  larger  than 
77  J.  Then  the  number  of  the  elements  of  a  s-sparse  approxi¬ 
mation  of  a  n-body  Hamiltonian  grows  linearly  as  0(ng(kri)), 
where  the  g(kv)  is  determined  by  the  geometry  of  the  system. 

A  general  class  of  many-body  interactions  arises  when 
we  change  the  basis  for  a  bosonic  or  ferminoic  system  ex¬ 
pressed  by  a  (typically  local)  second-quantized  Hamiltonian 
to  a  Pauli  basis,  e.g.,  via  a  Jordan- Wigner  transformation. 
For  fermionic  systems  the  interactions  are  imposed  physically 
from  Coulomb’s  force  and  Pauli  exclusion  principle.  The 
second-quantized  Hamiltonian  for  these  systems  can  be  gen¬ 
erally  written  as: 

H  —  ^  '  bpqQ,p  Qq  T  ^  '  bpqrsCLp  dq  drCls1  (10) 

p,q  p,q,r,s 

where  the  annihilation  and  creation  operators  ( a:j  and  at  re¬ 
spectively)  satisfy  the  fermionic  anti-commutation  relations: 
{a,i,  a }  =  Stj  and  {di,  dj}  =  0  [26].  For  example,  in  chem¬ 
ical  systems  the  coefficients  hpq  and  hpqrs  can  be  evaluated 
using  the  Hartree-Fock  procedure  for  N  single-electron  ba¬ 
sis  functions.  The  Jordan- Wigner  transformation  can  then  be 
used  to  map  the  fermionic  creation  and  annihilation  operators 
into  a  representation  in  terms  of  Pauli  matrices  ax,  av,az. 
This  allows  for  a  convenient  implementation  on  a  quantum 
computer,  as  was  demonstrated  recently  for  the  efficient  sim¬ 
ulation  of  chemical  energy  of  molecular  systems  [27].  An  im¬ 
portant  example  of  a  Coulomb  based  Hamiltonian  is  the  spin- 
coupled  interactions  in  quantum  dots  which  has  the  following 
Pauli  representation: 

H  =  ^2  Kj,k,  -aA  ®  aB  ®  aC  ■  ■  ■  *  (11) 

where  A,B,C,  -  •  •  indicate  the  location  of  the  quantum  dots, 
,  <7*s  are  Pauli  operators,  and  h,!hk.---  generally  represents  a 
many-body  spin  interacting  term.  In  practice,  these  Hamil¬ 
tonians  are  highly  sparse  or  almost  sparse  due  to  symmetry 
considerations  associated  with  total  angular  momentum  [28]. 
For  example  the  Hamiltonian  for  the  case  of  four  quantum 
dots  (A,  B,  C,  D)  takes  the  general  form  [28]: 

H exchange  —  J  E  (Ji.cFj  +  J'[(<ja-<JB)(ctc-ctd) 

A<i<j<D 

+  ( aA-<Jc)(vB-<JD )  +  ( aA-crD)(aB-ac )],  (12) 

Another  class  of  effective  many-body  interactions  often 
emerge  in  a  perturbative  and/or  short  time  expansion  of  dy¬ 
namics,  such  as  effective  three -body  interactions  between 
atoms  in  optical  lattices  [29]  that  we  study  in  this  work. 

Next,  we  simulate  the  performance  of  our  algorithm  for  es¬ 
timation  of  such  sparse  many-body  Hamiltonians  in  optical 
lattices  [29]  and  quantum  dots  [28], 


A.  Three-body  interactions  in  optical  lattices 

An  optical  lattice  is  a  periodic  potential  formed  from  in¬ 
terference  of  counterpropagating  laser  beams  where  neutral 
atoms  are  typically  cooled  and  trapped  one  per  site.  Consider 
four  sites  in  two  adjacent  building  blocks  of  a  triangular  op¬ 
tical  lattice  filled  by  two  species  of  atoms  [29].  The  interac¬ 
tion  between  atoms  is  facilitated  by  the  tunneling  rate  J  be¬ 
tween  neighboring  sites  and  collisional  couplings  U  when  two 
or  more  atoms  occupy  the  same  site.  For  each  site  an  effective 
spin  is  defined  by  the  presence  of  one  type  of  atom  as  the  up¬ 
state  f  and  the  presence  of  the  other  type  as  the  down-state  j,. 
Three-body  interactions  between  atoms  in  a  triangular  optical 
lattice  can  be  significant.  The  effective  Hamiltonian  for  this 
system  is  studied  in  Ref.  [29].  The  on-site  collisional  interac¬ 
tion  U ,  and  tunneling  rates  ./  =  .B  =  2. B  are  taken  to  be  the 
same  in  all  sites,  also  U  =  U jq-  =  t/jq  =  2A2Ufi  =  10kHz. 
The  effective  Hamiltonian  of  the  4-spin  system  is 

Hopt-latt=  ^2  bla'jaj+l+b2(Tjaj+laj+2  (13) 

j,cx=x,y,z 

where  { 6" ,  62  }  are  functions  of  {J,U}  and  their  explicit 
forms  are  given  in  appendix  D.  The  ratio  77  =  \J/U\  quan¬ 
tifies  the  sparsity  level.  For  a  fixed  value  of  U,  a  smaller  J 
leads  to  weaker  three-body  interactions  and  therefore  a  higher 
level  of  sparsity.  As  expected,  this  enhances  the  algorithm 
performance. 

We  assume  that  the  system  can  be  initialized  in  a  ran¬ 
dom  product  state  | tpk)  =  ®  •••  0  | Bt)-  where  \Bk) 

are  drawn  from  the  Fubini-Study  metric  induced  distribution. 
The  required  observables  for  the  algorithm  are  uniformly  se¬ 
lected  from  single  qubit  Pauli  operators  {erf,  cr?,  of}.  This 
choice  of  states  and  observables  allows  for  Ss  «  0.37  < 
\/2  —  1.  Let  us  denote  the  extracted  Hamiltonian  and  the 
true  Hamiltonian  by  H*  and  HtrUe ,  respectively.  Here,  the 
performance  of  the  algorithm  is  defined  by  the  relative  error 
1  -  \\H*  -  Htrue\\fro/\\Htrue\\fro.  The  results  for  different 
number  of  configurations  are  depicted  in  Fig.  (1),  for  various 
values  of  J.  As  evident  in  Fig.(l),  performance  accuracy  of 
above  94%  can  be  obtained  with  only  80  settings  significantly 
smaller  than  approximately  6  x  104  configurations  required  in 
QPT. 

The  robustness  of  this  scheme  was  also  investigated  for 
10%  random  error  in  simulated  experimental  data  leading  to 
about  a  5%  reduction  in  the  overall  performance. 


B.  Four-body  interactions  in  quantum  dots 

Another  important  class  of  effective  many-body  Hamilto¬ 
nians  can  be  obtained  for  electrons  in  quantum  dots  coupled 
through  an  isotropic  (Heisenberg)  or  anisotropic  exchange  in¬ 
teraction.  For  example  the  Hamiltonian  for  the  case  of  four 
quantum  dots  (A,  B,  C.  D )  takes  the  general  form  Eq.  (12). 
The  first  term  in  the  summation  is  a  two-body  Heisenberg  ex¬ 
change  interaction  and  the  last  three  terms  are  four-body  spin 
interactions.  In  certain  regimes,  the  ratio  \J'/J\  can  reach  up 
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FIG.  1:  The  Hamiltonian  estimation  average  performance  is  illus¬ 
trated  for  a  system  of  four  adjacent  sites  in  an  optical  lattice  for  dif¬ 
ferent  tunneling  rates,  J,  and  collisional  coupling  U  =  10 kHz.  The 
error  bars  demonstrate  the  standard  deviation  of  the  performance  due 
to  the  random  and  independent  selection  of  m  configurations  (shown 
only  for  J  =  5kHz).  Performance  accuracy  of  above  90%  with 
only  60  settings  is  achievable  for  J  =  1  kHz,  which  is  significantly 
smaller  than  about  6  x  104  required  experimental  configurations  in 
QPT. 


to  16%.  The  amplitude  of  r/  =  \J'/J\  determines  the  sparsity 
level  of  the  Hamiltonian. 

Here  we  use  an  efficient  modification  of  signal  recovery 
referred  as  ’’reweighted  li  -minimization”  which  is  described 
in  appendix  E.  The  performance  of  this  algorithm  is  demon¬ 
strated  in  Fig.  (2)  that  shows  a  significant  reduction  of  the 
required  number  of  settings  in  contrast  to  the  standard  QPT. 


FIG.  2:  Estimation  of  the  exchange  interaction  Hamiltonian  for  four 
electrons  in  quantum  dots.  The  average  performance  of  the  proce¬ 
dure  is  illustrated  for  different  values  of  |  J' /  J\  with  50  iterations  of 
the  H-reweighted  minimization.  The  standard  deviations  are  shown 
only  for  |  J'  j  J\  =  0.1  It  is  demonstrated  that  only  60  different  con¬ 
figurations  are  sufficient  for  estimating  the  unknown  Hamiltonian 
with  an  accuracy  above  95%  for  \J'/J\  =  0.05,  instead  of  about 
6  x  104  required  settings  via  QPT. 


VI.  V.  CHARACTERIZATION  OF  HAMILTONIAN  FINE 
STRUCTURES  AND  SYSTEM-BATH  INTERACTIONS 

A.  Hamiltonian  fine  estimation 

In  many  systems  a  primary  model  of  the  interactions  is 
often  known  through  physical  and/or  engineering  consider¬ 
ations.  Starting  with  such  an  initial  model  we  seek  to  im¬ 
prove  our  knowledge  about  the  Hamiltonian  by  random  mea¬ 
surements.  Let’s  assume  the  initial  guess  about  the  Hamil¬ 
tonian  Hq  is  close  to  the  true  form  Htrue  that  is  |  j  A  = 
Htme  —  H0\\ -C  1 1 Htrue 1 1 .  Therefore  for  a  perturbative  treat¬ 
ment  we  demand  t  \  \  A\ |  <C  1,  which  is  a  much  weaker  require¬ 
ment  compared  to  f||fTtr.Me||  -C  1.  We  can  approximate  Eq. 
(1)  in  the  paper  to  find 

Pjk  «  {i>k\  Mj  \lpk) 

+  i  (^k\ [  f  eisH°Ae~isH°ds,  M?]  |^} ,  (14) 

J  o 

where  M{-  =  eltH° Mje~ltH°  [31].  This  equation  is  linear 
in  A,  consequently,  in  a  similar  fashion  as  above,  the  com¬ 
pressed  sensing  analysis  can  be  applied  for  efficient  estima¬ 
tion  of  the  fine  structure  of  Hamiltonians. 


B.  Characterizing  system-bath  interactions 

The  identification  of  a  decoherence  process  is  a  vital  task 
for  quantum  engineering.  In  contrast  to  the  usual  approach  of 
describing  dynamics  of  an  open  quantum  system  by  a  Kraus 
map  or  a  reduce  master  equation,  here  we  use  a  microscopic 
Hamiltonian  picture  to  efficiently  estimate  the  system-bath 
coupling  terms  generating  the  overall  decoherence  process. 
However  since  we  consider  a  full  dynamics  of  the  system  and 
bath,  this  method  can  be  applied  to  a  finite  size  environment 
such  as  a  spin  bath,  or  a  surrogate  Hamiltonian  modeling  of  a 
infinite  bath.  In  the  latter  case  a  harmonic  bath  of  oscillators 
is  approximated  by  a  finite  spin  bath  [32]. 

Consider  an  open  quantum  system  with  a  total  Hamiltonian: 

H  =  Hg  <8>  Ib  +  Is  ®  Hb  +  Hsb  (15) 

and 

Hsb  =  '^j^p,qSp®  Bq  (16) 

P,Q 

where  Hs  ( Hb )  denotes  the  system  (bath)  free  Hamilto¬ 
nian  and  Hsb  is  the  system-bath  interaction  with  coupling 
strengths  {Xpq},  and  a  complete  operator  basis  of  the  system 
and  bath  being  {S^}  and  { Bq } ,  respectively. 

We  develop  a  formalism  to  estimate  Xvq  parameters  in  the 
weak  system-bath  coupling  regime  and  with  the  sparsity  as¬ 
sumption  that  a  few  number  of  X;i  q  have  a  significant  value. 

The  Liouvilian  dynamical  equation  is 

-^PSb{ t)  =  (£0  +  Xpq£pq)[psB{t)}  (17) 

pq 
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where  £0[.]  =  -i[Hs  <g>  Ib  +  Is  ®  HB,  ■]  and  Cpq[]  = 
— i[Sp®Bq , .].  In  the  regime  of  weak  coupling  to  a  finite  bath, 
||ifsB||  <  min{||ifs||,  ||ifs||},  the  Liouvillan equation  (17) 
can  be  solved  perturbatively  if  time  t  satisfies  f||f?sB||  <C  1. 
For  an  initial  system  density  state  pk,  using  the  matrix  identity 
given  in  Ref.  [31]  we  find  the  measurement  outcomes  as 

Pjk  ~  tr(pkMj )  (18) 

+  '^2^pqtr([f  dse(t~s)CoCpqesCo{pk\,Mj\) 

pq  0 

where  Mj  is  a  system  only  observable.  This  affine  function 
between  the  outcomes  pjk  and  coupling  parameters  {Apg} 
is  similar  to  Eq.(2)  in  the  paper  for  Hamiltonian  estimation. 
Consequently,  the  compressed  sensing  algorithm  can  be  em¬ 
ployed  for  computing  {AP9}s. 

VII.  OUTLOOK 

We  have  introduced  an  efficient  and  robust  experimental 
procedure  for  the  identification  of  nearly  sparse  Hamiltoni¬ 
ans  using  only  separable  (local)  random  state  preparations  and 
measurements.  There  are  a  number  of  future  directions  and 
open  problems  associated  with  this  work.  It  is  not  known 
how  the  performance  of  the  algorithm  depends  on  the  distribu¬ 
tion  of  the  ensemble  from  which  the  states  and  measurement 
observables  are  drawn.  Also,  a  general  closed-loop  learn¬ 
ing  approach  for  updating  the  knowledge  of  sparsity  basis  of 
an  arbitrary  Hamiltonian  is  an  interesting  open  problem  that 
will  be  of  importance  for  generic  compressed  system  identi¬ 
fication.  The  presented  method  for  Hamiltonian  estimation 
is  promising  for  drastic  reduction  in  the  number  of  experi¬ 
mental  configurations.  However  the  classical  resources  for 
post-processing  is  not  scalable.  A  fully  scalable  Hamiltonian 
estimation  method  might  be  achievable  via  a  hybrid  of  com¬ 
pressed  sensing  and  DMRG  (Density-Matrix  Renormalization 
Group)  methods  [33].  A  compressed  tomography  method  can 
also  be  developed  for  nearly  sparse  quantum  processes  [34]. 
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Appendix  A:  vectors  and  operator  norm 

In  this  paper  we  use  the  following  different  norms: 

For  a  vector  x, 

iMii2  =  v^,\\x\\h  =  \xj\.  (Ai) 

i 

For  a  matrix  A, 

1 1^1 1  spec  =  \j  Xmax(Ai  A)  (A2) 


where  Xmax  means  largest  eigenvalue. 

||A||/ro  =  J  trace(Ai  A)  (A3) 


Appendix  B:  Analysis  of  the  short  time  approximation 

The  short  time  monitoring  of  the  system’s  dynamics  re¬ 
quires  a  prior  knowledge  of  the  dynamical  time  scales.  In 
the  solid-state  quantum  devices,  in  particular  in  the  context 
of  quantum  control  and  quantum  information-processing,  the 
time-scale  of  single  qubit  rotations  is  typically  on  the  order 
of  1-10  ns.  The  switching  time  for  exchange  interactions 
varies  among  different  solid-state  systems.  For  superconduct¬ 
ing  phase  qubit  the  duration  of  a  swap  gate  is  about  10  ns  [35]. 
For  electron-spin  qubits  in  quantum  dots  and  in  donor  atoms 
(Heisenberg  models)  [36-38],  and  also  for  quantum  dots  in 
cavities  (anisotropic  exchange  interactions)  [39]  the  coupling 
time  is  between  10- lOOps,  while  for  exciton-coupled  quan¬ 
tum  dots  (XY  model)  and  Forster  energy  transfer  in  multichro- 
mophoric  complexes  the  relevant  time  scale  is  in  the  order  of 
lps.  Next  we  rigorously  derive  bound  on  the  evolution  time  t, 
that  guarantees  the  validity  of  the  short  time  approximation. 

For  an  input  state  |t/'fc),  the  expectation  value  of  an  observ¬ 
able  Mj  is 

Pjk  =  < ipk{t)\Mj  | il>k(t))  =  (ipk\ elHtMje~lHt  \ipk)  (Bl) 

Considering  the  expansion  of  the  propagator  e~lHt  =  I  — 
itH  —  A t2H 2  +  ....  we  find 

Pjk  =  (PI  Mj  |p)  +  it  (p |  [ H ,  Mj]  | ijjk) 

-  t-(^k\[H,[H,Mj]}\^k)  +  ...  (B2) 

Therefore,  for  the  linearization  assumption,  it  is  sufficient 
to  have  for  the  Z’th  term 

l  times 

tlmin(ipk\  [H,  [H,  [...,Mj]]\  \ipk)  < 

J 

tlmm\\[H,  [H,  [..., Mj}]]\\spec  <  1,  VZ.  (B3) 

3 

A  tighter  bound  can  be  found  for  operators  {Mj}  from  a 
POVM  as 

\\[H,[H,[...,Mj}}}\\spec<2l\\H\\lspec  (B4) 

To  derive  this  we  use 
IP,B]||  spec  —  UB II 

spec  +  ||-B4l||spec  <  2 1 1  -*4 1 1  spec  1 1 B 1 1  spec 

(B5) 

and  P|| spec  =  \\AA%pec. 

This  gives  a  single  bound  sufficient  for  linearization:  t  <C 
—  1 1  FT  II  — 1 

2  II'  II spec • 
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Appendix  C:  RIP  from  a  concentration  inequality 

In  this  work,  we  generalize  the  standard  compressed  sens¬ 
ing  algorithm  such  that  the  necessity  for  independent  random¬ 
ness  in  all  elements  of  the  measurement  matrix,  cj>,  can  be 
avoided.  A  common  approach  to  establish  RIP  ([24])  for  a 
matrix  $  is  by  introducing  randomness  in  the  elements  of  this 
matrix.  This  approach  benefits  from  measure  concentration 
properties  of  random  matrices.  In  classical  signal  processing 
each  element  &jk,a  can  be  independently  selected  from  a  ran¬ 
dom  distribution  such  as  Gaussian  or  Bernoulli.  Whereas  in 
the  Hamiltonian  estimation  formulation  (Eq.  (4)  in  the  paper) 
there  is  no  freedom  for  independent  selection  of  the  <l»  matrix 
elements. 

Here  we  prove  the  concentration  inequality  that  we  em¬ 
ployed  for  establishing  the  restricted  isometry  property. 

Though  $  is  a  random  matrix,  because  it  is  constructed 
from  quantum  states  and  observables  of  a  finite  dimensional 
system,  it  is  bounded.  Thus  we  are  able  to  apply  Hoeffd- 
ing’s  concentration  inequality:  If  Vi, . . . ,  vrn  are  independent 
bounded  random  variables  such  that  Prob.{uj  £  [a*,  bi]}  =  1, 
then  for  S  =  J2i  vi> 


Prob.lS  -  E (S)  >t}< 

Prob.{5  -  E(S)  <  -t}  <  e“2<2/  (Cl) 


M  can  be  scaled  by  any  real  number,  a  sufficient  condition  is 
g/ f  <  (1  +  <5)/(l  —  <5).  For  the  simulations  in  this  paper,  this 
ratio  becomes  2.176. 


Appendix  D:  4-sites  optical  lattice  Hamiltonian 


Let  us  consider  four  sites  in  two  adjacent  building  blocks 
of  a  triangular  optical  lattice  filled  by  two  species  of  atoms,  f 
and  !.  Atoms  interact  by  tunneling  between  neighboring  sites, 
./  ’  and  dK  and  through  collisional  couplings  in  the  same  site, 
U.  The  Hamiltonian  for  such  system  can  be  written  as  [29]: 


/t2  i  t4-2  r f3  I  jl3 

Hopt_latt  =  YJ(^—± — 0.27 


-( 


U 

3 

2.1(J'f  +  J±)  Jt  J±  jtjl 


U2 


J<T„-  O 


3U3+ 1 


u2 


+  — )W+  l+^+l) 


Aj^3 4 5 6-Ji3  z  z  z 

z2  °-14 - Jp - ajaj+laj+2 


—  0  •  6  (aj  aj+laj+2  +  OjVj+lVj+i), 

(Dl) 


for  any  t  >  0.  (Here  E  denotes  the  expectation  value.)  Set 
Vi  =  |<^fx|1 2  for  a  row  fa.  Then  with  S  =  J2i  vi  =  ||$2;||22, 
we  get  Vx, 

Vi  =  x\(f>i<t>\)x  £  (l/m)[w;,u>„]||x||22 

E(S)  =  E||$x||22  G  [f,g] \\x\\l  (C2) 

for  constants  wi,wu ,  /,  g.  Note  that  /  and  g  are  the  min  and 
max  singular  values  of  E(<fO<I>).  From  (C2)  we  find  Vi+  ,t-  > 
0  and  Vx, 

Prab.lS1  —  g||x||f2  >  t+}  <  Probes'  —  E(5)  >  f+} 
Prob.jS'  —  / 1 1 x 1 1 <  —  t-}  <  Probes'  —  E(5)  <  — f_} 

These  together  with  (Cl)  and  (C2),  and  the  choice  of  t+  = 
(<5+  1  -g)||x||f2  and  f_  =  {f  -  1  +  S)\\x\\f2  yields 

—  2m(<5+e)^ 

Prob.{||$x||2  -  ||x||2|  >  <5||x||22}  <  2e  (C3) 

with  e  =  min{l  —  g,f  —  1}.  To  ensure  that  f+,f_  >  0, 
we  need  1  —  5<f<g<l  +  5.  Since  the  observable 


where  oi /v,z  are  Pauli  operators. 


Appendix  E:  Reweighted  C -minimization 

In  order  to  simulate  our  alogrithm  performance  for  estimat¬ 
ing  the  above  Hamiltonian  we  use  an  iterative  algorithm  that 
outperforms  the  standard  l  \  norm  minimization  [30].  This 
procedure  entails  initializing  a  weight  matrix  W  =  I ,p  and 
a  weight  factor  a  >  0,  and  repeating  the  following  steps  until 
convergence  is  reached: 

1.  Solve  for  ft.,  minimize  HWftHjj 

subject  to  || p  —  $ft||;2  <  e. 

2.  Update  weights 

W  =  diag(\/ (|fti|  +  a), ...,  l/(|ftd2|  +  &)).  (El) 

where  ft  =  vec(ftj)  is  the  Hamiltonian  vectorized  form.  4>  is 
the  measurement  matrix  and  p'  is  the  experimental  data  with 
a  noise  threshold  e. 
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